Process CCD Notebook #34

kadrlica · 2018-05-16T04:47:16Z

This is mostly my attempt to run processEImage.py on the twinkles data. It has ended up not being as easy as I hoped.

The text was updated successfully, but these errors were encountered:

drphilmarshall · 2018-05-16T18:35:14Z

"It has ended up not being as easy as I hoped."

First contender for Stack Club tagline? :-)

Do you want to submit a PR and use it to point out some sticking points? Might be good for all of us watching to appreciate the challenges. Thanks for taking a crack at this!

kadrlica · 2018-05-16T19:03:15Z

Current version of the notebook can be found here but not sure if it's ready for a PR. The notebook is not fully functional, and the github PR interface shows changes to the json, which I don't find especially enlightening (though maybe I'm missing something?).

Some of the sticking points:

processEimage.py currently relies on Python2, but the stack versions on jupyterlab are all python3. I've created my own version of processEimage.py that updates the python version and it appears to work, but cannot be distributed with the notebook. In the notebook I directly interface with ProcessEimageTask API, but this is clunky.
Use of old-style reference catalogs astrometry_net_data which are not installed in current versions of the stack (UNRESOLVED)
Larger memory use than can be supported in a tiny jupyterlab instance (this can be solved by choosing medium)
My general opinion is that running something that takes several minutes to complete doesn't seem optimal for the notebook format.

drphilmarshall · 2018-05-16T19:32:14Z

Thanks Alex.

processEimage.py relying on python 2 sounds like a bad sign: it could be falling behind the stack, perhaps through dis-use. Is processCCD.py itself python 3-compatible? If so, shelving the processEimage.py part of the tutorial and using a different (and raw) test image might be the way to go, I guess. Would that solve the reference catalog problem as well?
The memory issue looks serious. Suppose we insist on being able to run processCCD.py from a tutorial notebook, @SimonKrughoff - what would the implications be for the NCSA accounts? Is medium an allowable option for Stack Club members?

A couple of minutes runtime is awkwardly long for a live run through in front of an audience - but for a notebook that you work through at home I think it's fine (as long as the text tells you that you'll need to wait!). The other thing we could do is have on hand the results of a pre-baked run - which could be helpful for comparing results, but would also enable the user to skip the time-consuming part and get to the later section where the results of the task are displayed and explained.

Regarding PR threads for notebooks: I agree it's not ideal that we can only make line comments on the raw notebook code, but the PR thread does at least provide a "view" button where you can see the rendered notebook (so it saves you pasting links to the file in the branch). And then of course it enables the usual code review conversation in place, as well as tracking all the contributions made. I'd say that notebook was ready for PR no problem - you should just flag it as not yet ready to merge, in the initial comment. (Personally I like the workflow where you make the PR as soon as the dev branch is made in the base repo - so that that branch has an explanation for its existence.)

kadrlica · 2018-05-16T21:36:15Z

As far as I know, the twinkles and HSC data sets are the best curated. My personal thinking was that it was better to use simulated LSST images than real data from HSC. If we use the HSC dataset we should be able to follow the v15.0 tutorial line for line, but I'm not sure we want to do that.

The python3 compatibility is a one-line change to the shebang. I've already submitted an issue on this here. We also might be able to get away with something smaller than medium but I haven't checked.

SimonKrughoff · 2018-05-17T19:23:56Z

I think running on medium should be fine. The cluster is fairly large, so I don't think we'll run out of CPUs. I think we will learn a lot about how k8s packs jobs onto nodes and how to configure things correctly.
Diffing notebooks is a known issue. There is a package called nbdime that looks promising, but is not integrated into the github UI.
The python2 problem should just be fixed. I'll try to get to that tomorrow and then push it to the fork.

SimonKrughoff · 2018-06-05T18:42:18Z

I never got to 3 above, but it is on the branch that Heather Kelly is about to merge.

kadrlica · 2018-08-10T05:29:03Z

ProcessEImage is now broken with the following error:

---------------------------------------------------------------------------
ParentsMismatch                           Traceback (most recent call last)
/opt/lsst/software/stack/stack/miniconda3-4.3.21-10a4fa6/Linux64/daf_persistence/16.0/python/lsst/daf/persistence/butler.py in _setAndVerifyParentsLists(self, repoDataList)
    925                     try:
--> 926                         repoData.cfg.extendParents(parents)
    927                     except ParentsMismatch as e:

/opt/lsst/software/stack/stack/miniconda3-4.3.21-10a4fa6/Linux64/daf_persistence/16.0/python/lsst/daf/persistence/repositoryCfg.py in extendParents(self, newParents)
    181                                   "existing parents list in this RepositoryCfg: {}").format(
--> 182                                   newParents, self._parents))
    183 

ParentsMismatch: The beginning of the passed-in parents list: [RepositoryCfg(root='../../../../../DMStackClub/ImageProcessing/DATA', mapper=<class 'lsst.obs.lsstSim.lsstSimMapper.LsstSimMapper'>, mapperArgs={}, parents=[], policy=None), RepositoryCfg(root='../..', mapper=<class 'lsst.obs.lsstSim.lsstSimMapper.LsstSimMapper'>, mapperArgs={}, parents=[], policy=None)] does not match the existing parents list in this RepositoryCfg: [RepositoryCfg(root='/home/kadrlica/notebooks/DMStackClub/ImageProcessing/DATA', mapper=<class 'lsst.obs.lsstSim.lsstSimMapper.LsstSimMapper'>, mapperArgs={}, parents=[], policy=None)]

kadrlica · 2018-08-10T17:33:26Z

I've decided to give up on ProcessEimage and move to ProcessCcd with the HSC data.

kadrlica · 2018-08-10T17:35:49Z

Simon suggests that some modifications of configuration parameters could be useful:

Change the binning of the background model
Change the PSF model
Change the source detection threshold

He also suggests looking at Jim Bosch's Leon notebook:

https://github.com/lsst-dm/dm-demo-notebooks/blob/master/workshops/lsst2018/intro-with-globular.ipynb

kadrlica · 2018-08-10T17:40:15Z

He also suggests Jonathan's documentation on configuration:

https://pipelines.lsst.io/v/weekly/modules/lsst.pipe.base/command-line-task-config-howto.html

kadrlica mentioned this issue May 16, 2018

processEimage notebook #36

Merged

drphilmarshall added the project Projects that Stack Club members are working on, defined in the top comment and discussed thereafter label Jul 11, 2018

drphilmarshall assigned kadrlica Aug 5, 2018

kadrlica mentioned this issue Aug 10, 2018

Add ProcessCcd notebook #93

Open

drphilmarshall added the Image Processing Topic area, to help people find projects to work on label Sep 27, 2018

drphilmarshall added this to the 2018 Stack Club Holiday Party milestone Oct 8, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Process CCD Notebook #34

Process CCD Notebook #34

kadrlica commented May 16, 2018

drphilmarshall commented May 16, 2018 •

edited

Loading

kadrlica commented May 16, 2018 •

edited

Loading

drphilmarshall commented May 16, 2018

kadrlica commented May 16, 2018

SimonKrughoff commented May 17, 2018

SimonKrughoff commented Jun 5, 2018

kadrlica commented Aug 10, 2018 •

edited

Loading

kadrlica commented Aug 10, 2018

kadrlica commented Aug 10, 2018

kadrlica commented Aug 10, 2018

Process CCD Notebook #34

Process CCD Notebook #34

Comments

kadrlica commented May 16, 2018

drphilmarshall commented May 16, 2018 • edited Loading

kadrlica commented May 16, 2018 • edited Loading

drphilmarshall commented May 16, 2018

kadrlica commented May 16, 2018

SimonKrughoff commented May 17, 2018

SimonKrughoff commented Jun 5, 2018

kadrlica commented Aug 10, 2018 • edited Loading

kadrlica commented Aug 10, 2018

kadrlica commented Aug 10, 2018

kadrlica commented Aug 10, 2018

drphilmarshall commented May 16, 2018 •

edited

Loading

kadrlica commented May 16, 2018 •

edited

Loading

kadrlica commented Aug 10, 2018 •

edited

Loading