-
Notifications
You must be signed in to change notification settings - Fork 616
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Switching device from default.qubit to qiskit.aer makes workflow non-differentiable #5909
Comments
Thanks for the report @DanielNino27 . The differences you are seeing between the two devices comes down to the choice of diff method.
I also noticed you are trying to calculate a hessian. To calculate a hessian with parameter shift, you will need to specify
It looks like you specified Hope that helps :) |
Thanks for the suggestion, @albi3ro. I tried the changes you suggested and the warning indeed goes away but the computation seems to get stuck with diff_method = 'parameter-shift with default.qubit. I haven't waited long enough to complete an iteration of optimization (I left it running for an hour and it still hadn't complete the first iteration of optimization, so it seems to take several orders of magnitude longer than with just backpropagation at least - if not just getting stuck somewhere along the way. My understanding is that the difference shouldn't be so large between parameter-shift and backpropagation for this example - is that the case? |
So I'm running the example with null qubit First-order parameter shift produces two (or sometimes four+) executions per trainable parameter. If we have 10 parameters, that means 20 first-order gradient tapes. When taking a second-order derivative, we have to calculate the derivative for each parameter for each gradient tape. That means 20 hessian tapes per gradient tape. We are now at 1 initial execution + 20 first order tapes + 400 hessian tapes. 421 total executions. Now caching does occur by default in pennylane with higher order derivatives, so some of those are indeed going to be duplicates. So with caching, I think we bring that down to (1 + 2* N + N+1 + 4N(N-1)) = 392. So be wary. Also, it looks like your loss function is the hessian. So you should actually be calculating third-order derivatives if you want to use gradient-based optimization. Which would then be 8,000 tapes... but caching will probably play a much larger role at that point. I'll provide more information when I get it, but I believe this is the source of your problem. |
And confirmed. For:
We had 466 executions occur. |
Expected behavior
With default.qubit, cost function is successfully optimized with no warning. Output is as attached:
I would expect with the qiskit.aer device, a similar output.
Actual behavior
When the optimization loop runs with the qiskit.aer device, it gives the following warning:
*/[anaconda3\envs\qml_env\Lib\site-packages\autograd\tracer.py:14] (*/envs/qml_env/Lib/site-packages/autograd/tracer.py:14): UserWarning: Output seems independent of input. warnings.warn("Output seems independent of input.")
The optimization still runs and the cost function is actually lower than with default.qubit but that is likely illusory as something along the way becomes non-differentiable with the qiskit device.
Additional information
No response
Source code
Tracebacks
No response
System information
Existing GitHub issues
The text was updated successfully, but these errors were encountered: