Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stefcal crashes with a completely flagged slot #78

Open
bennahugo opened this issue Dec 21, 2017 · 5 comments
Open

Stefcal crashes with a completely flagged slot #78

bennahugo opened this issue Dec 21, 2017 · 5 comments
Labels

Comments

@bennahugo
Copy link
Contributor

Sharmila reported this. On one of the particularly bad datasets a previous round of selfcal flagged particularly aggressively and the chisq is 0. Why is this an integer?

2022.12 47.3Gb gainopts(StefCal.py:752:get_result): ('00', '01') data type of model is complex128
2022.12 47.3Gb gainopts(StefCal.py:777:get_result): G: solvable 1 from major loop 0 (current 0)
2030.97 47.3Gb gainopts(StefCal.py:1276:run_gain_solution): solving for G, initial chisq is 51487312.2275
2060.32 47.3Gb gainopts(StefCal.py:1287:run_gain_solution): iter 1: 0.00% (0/62) conv, 0 gfs, max update 3.44259
2090.19 47.3Gb gainopts(StefCal.py:1287:run_gain_solution): iter 2: 0.00% (0/62) conv, 0 gfs, max update 0.196195
2119.80 47.3Gb gainopts(StefCal.py:1287:run_gain_solution): iter 3: 0.00% (0/62) conv, 0 gfs, max update 0.111077
2154.24 47.3Gb gainopts(StefCal.py:1287:run_gain_solution): iter 4: 100.00% (62/62) conv, 868 gfs, max update 0
2160.75 47.3Gb gainopts(StefCal.py:1323:run_gain_solution): G converged at chisq 51487312.2275 (last gain update 0) after 4 iterations and 129.77s
2160.75 47.3Gb gainopts(StefCal.py:1369:run_gain_solution):   delta-chisq were 
2160.75 47.3Gb gainopts(StefCal.py:1370:run_gain_solution):   convergence criteria were 3.4 0.2 0.11 0
2160.76 47.3Gb gainopts(StefCal.py:1393:run_gain_solution): flagged gains per antenna: 00 574200.00%, 01 574200.00%, 03 574200.00%, 06 574200.00%, 07 574200.00%, 08 574200.00%, 10 574200.00%, 12 574200.00%, 14 574200.00%, 15 574200.00%, 17 574200.00%, 18 574200.00%, 21 574200.00%, 31 574200.00%
2161.16 47.3Gb gainopts(StefCal.py:786:get_result): applying G-inverse to data
2163.35 47.8Gb gainopts(StefCal.py:790:get_result): done
2163.35 47.8Gb gainopts(StefCal.py:777:get_result): B: solvable 1 from major loop 0 (current 0)
2171.77 47.8Gb gainopts(StefCal.py:1276:run_gain_solution): solving for B, initial chisq is 0
2207.46 47.9Gb gainopts(StefCal.py:1287:run_gain_solution): iter 1: 1.01% (58/5742) conv, 0 gfs, max update 0.5353
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/Cattery/Calico/OMS/StefCal/StefCal.py", line 783, in get_result
    flagged |= self.run_gain_solution(opt,model,data,weight,bitflags,flag_null_gains=True,looptype=looptype);
  File "/usr/local/lib/python2.7/dist-packages/Cattery/Calico/OMS/StefCal/StefCal.py", line 1300, in run_gain_solution
    dchi = (chisq0-chisq)/chisq;
ZeroDivisionError: integer division or modulo by zero
1.7Gb meqserver(meqserver.py:288:stop_default_mqs): meqserver not exited yet, waiting another 10 seconds
/home/sharmila/output/1491862657-corr-cal2.gain.cp does not exist, so not trying to remove
/home/sharmila/output/1491862657-corr-cal2.gain1.cp does not exist, so not trying to remove
input flagmask is
### Job result: None
### No more commands
### meqserver reported 25 error(s) during the run:
###   000: node 'stefcal': <2 exceptions> (while getting result for request ev.0.0.0.0.1)
###   001: node 'VisDataMux': error processing tile 0.0.0.0.1.1
###   002: node 'stefcal': <2 exceptions> (while getting result for request ev.0.0.0.1.1)
###   003: node 'VisDataMux': error processing tile 0.0.0.0.1.2
###   004: node 'stefcal': <2 exceptions> (while getting result for request ev.0.0.0.2.1)
###   005: node 'VisDataMux': error processing tile 0.0.0.0.1.3
###   006: node 'stefcal': <2 exceptions> (while getting result for request ev.0.0.0.3.1)
###   007: node 'VisDataMux': error processing tile 0.0.0.0.1.4
###   008: node 'stefcal': <2 exceptions> (while getting result for request ev.0.0.0.4.1)
###   009: node 'VisDataMux': error processing tile 0.0.0.0.1.5
###   010: node 'stefcal': <2 exceptions> (while getting result for request ev.0.0.0.5.1)
###   011: node 'VisDataMux': error processing tile 0.0.0.0.1.6
###   012: node 'stefcal': <2 exceptions> (while getting result for request ev.0.0.0.6.1)
###   013: node 'VisDataMux': error processing tile 0.0.0.0.1.7
###   014: node 'stefcal': <2 exceptions> (while getting result for request ev.0.0.0.7.1)
###   015: node 'VisDataMux': error processing tile 0.0.0.0.1.8
###   016: node 'stefcal': <2 exceptions> (while getting result for request ev.0.0.0.8.1)
###   017: node 'VisDataMux': error processing tile 0.0.0.0.1.9
###   018: node 'stefcal': <2 exceptions> (while getting result for request ev.0.0.0.9.1)
###   019: node 'VisDataMux': error processing tile 0.0.0.0.1.10
###   020: node 'stefcal': <2 exceptions> (while getting result for request ev.0.0.0.10.1)
###   021: node 'VisDataMux': error processing tile 0.0.0.0.1.11
###   022: node 'stefcal': <2 exceptions> (while getting result for request ev.0.0.0.11.1)
###   023: node 'VisDataMux': error processing footer 0.0.0
###   024: node 'VisDataMux': execute() failed: integer division or modulo by zero (return code 0x810021)
### Stopping the meqserver
### All your batch are not belong to us, returning with error code
Traceback (most recent call last):
  File "/code/run.py", line 256, in <module>
    run_meqtrees(msname)
  File "/code/run.py", line 220, in run_meqtrees
    utils.xrun(cab['binary'], args + ['-s {}'.format(saveconf) if saveconf else ''])
  File "/utils/utils/__init__.py", line 74, in xrun
    raise SystemError('%s: returns errr code %d'%(command, process.returncode))
SystemError: /usr/bin/meqtree-pipeliner.py: returns errr code 1
@bennahugo bennahugo added the bug label Dec 21, 2017
@IanHeywood
Copy link
Contributor

I see the division-by-zero message a lot and assumed it was for fully flagged tiles, but it's never caused it to bail out. But then also, I've never solved for B.

/usr/local/lib/python2.7/dist-packages/Cattery/

Maybe it's been fixed in a more recent version than whatever repo this was installed from?

@bennahugo
Copy link
Contributor Author

Hmm yea I will have to dig. It is probably a B jones thing. We solve for G (so take DC out) then B in chunks to get the SNR. This usually gives better results than just one G term. This is as recent as it gets - KERN 3.

@o-smirnov
Copy link
Contributor

It's initialized as integer 0, then accumulated... so yeah, if an entire tile is flagged, it just remains integer 0.

@o-smirnov
Copy link
Contributor

I need to add a check for this so it doesn't just fall over stupidly.

@ludwigschwardt
Copy link

ludwigschwardt commented Dec 25, 2017 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants