-
Notifications
You must be signed in to change notification settings - Fork 12
Overview of `make_cube.py`
The make_cube.py
script houses two classes: CubeFactory
and TicaCubeFactory
. Both of these classes serve to generate a cube from a batch of TESS full frame images (FFIs) that correspond to a single TESS sector/camera/CCD observation. There are two different product (FFI) types that you can generate cubes from that are structurally different, which is why there is one class for each FFI type: CubeFactory
will allow you to generate a cube from a batch of Mission-delivered FFIs (SPOC FFIs) and TicaCubeFactory
will allow you to generate a cube from a batch of TESS Image CAlibrator FFIs (TICA High-Level Science Product FFIs). There are advantages to using each type, which are outlined in the TESSCut API documentation. This wiki page will walk you through the make_cube.py
functionality
The CubeFactory
class serves to take in a batch of mission-delivered (SPOC) full frame images and generate a 4-dimensional cube with them, which is then used by CutoutFactory
to generate a cutout target pixel file (TPF) for a designated target or set of coordinates. The main method of this class is make_cube
, which calls upon the other methods to generate the cube and store it into a FITS file. The order upon which the methods are called is as follows:
__init__
_configure_cube
_build_info_table
_build_cube_file
_write_block
_write_info_table
The functionality of each of these methods will be discussed briefly in the sections following. Should there be any questions/comments regarding this wiki page, please file an issue under the Issues tab in this repository or contact us through the Archive Help Desk at [email protected].
This is a standard initialization function within a class that instantiates a certain set of variables as part of the class for future calls later in the code. The max_memory
argument is the allocated memory (in GB) that can be used when generating the cube. The smaller this number, the longer it will take to generate the cube, but this number will depend on the specifications of your machine. We suggest leaving this at the default 50 GB unless it causes computational errors.
The attributes block_size
and num_blocks
are dependent on the input max_memory
. This is because the FFIs are not read into the cube all at once, but rather in chunks or "blocks". The block size, and therefore the number of blocks, is calculated based on the max_memory
value. More detail on this is given under _configure_cube
, where these attributes are assigned values.
The remaining attributes are things such as keyword headers, or variables that will be called across multiple functions such as cube_file
.
This method will determine the block size (a subset of the full FFI) and the number of blocks to iterate through, based on the input max_memory
from __init__
. The largest possible block size, or max_block_size
is determined by taking the max_memory
(converted to bytes from GB) and dividing it by the slice_size
, which is the # of columns in the image multiplied by the # of FFIs in the batch * 2 * 4
. [VERIFY: CEB: YUP] We multiply by 2
because each cube will have 2 "layers" (axes): 1 axis dedicated to the science data and 1 axis dedicated to the error data, and then multiply by 4
because there are approximately [CEB: I don't think it's aproximate, I think it's 64bit floats....] 4 bytes per pixel. After this calculation is done, the number of blocks is determined by taking the # of rows in the FFIs and dividing by the slice_size
, adding 1
, and rounding down to the nearest integer ([VERIFY: YUP] we add 1
because np.int
always rounds down).
This is also the method that will assign the template_file
variable. This is the FFI whose image header keywords are used as the template to initialize the image header table in the later steps.
Lastly, this method also makes the cube's primary header. To do this, we copy the primary header keywords and keyword values from the first FFI in the stack, to the cube file's primary header. Extra header keywords are added such as the factory keywords, and lastly, some primary header keywords are populated with the keyword values from the last FFI in the stack, to indicate the time span of the observations included in the cube. These keywords are DATE-END
and TSTOP
for the SPOC cubes, and ENDTJD
for TICA cubes.
With the template file that is defined in _configure_cube
, this method sets up the columns of the image header table by making a column for each image header keyword in the FFI (and thus all the FFIs) and assigning it the correct data type. If you use TicaCubeFactory
and receive an Info Warning: Card too long
message, this is the piece of code that is the culprit. There is a keyword in the TICA header called COMMENT
which contains a comment on the TICA processing that may have different lengths depending on the FFI. When the template file chosen for that stack contains a COMMENT
keyword value string that is shorter than any of the FFIs that follow, this warning message will trigger. It does not appear to clip the COMMENT
keyword values, so it is not something to worry about currently.
Once the column objects are defined, they are appended to a cols
list which is then turned into an astropy.table.Table
object, and assigned to the info_table
variable.
This method builds the cube file and writes it into a file with the name designated via the cube_file
argument in the call. It writes in the primary header that was created in the _configure_cube
step, and initializes the HDU list with just the primary HDU and writes it into the file. It then creates the cube header for the image HDU, first populating the data side of the HDU with an empty cube of size (100, 100, 10, 2)
, and then populates the header with some keywords defining the dimensions of the cube. The SPOC pipeline does propagate errors in the production of the FFIs, so the science and error axes are both included in this cube.
Next, it expands the files size so that it can accommodate all the FFIs that will be part of the cube. To do this, the total cube size (cubesize_in_bytes
) must be calculated. The size is calculated by taking the area of the FFIs, and multiplying it by 4
which is the estimated [CEB: again, I don't think this is an estimate...] byte size of 1
pixel. We add 2880
to this value because FITS requires all blocks to be a multiple of 2880
, and then subtract by 1
because [VERIFY: CEB: not this true actually, see pt 2 in the email.] numpy
arrays are zero-indexed. All of that is integer divided by 2880
, and then multiplied by 2880
once again.
The last step in this method is writing a null byte. It takes the total size in bytes of the cube and seek [CEB: past the end of the file to the size we want the file to be.] to the end of the file to write a null byte (CUBE.write(b'\0')
) because [VERIFY: CEB: Writing this byte is what actually grows the file, up to this point we've done all the calculations but the file is just sitting there as is. By seeking past the end of the file and writing a single byte we expand the file to the sise we have calculated.]????
From here we begin the iterating process of writing the FFIs into the cube. The cube created with _build_cube
is opened and fed into _write_block
along with a few other arguments of interest:
-
start_row
andend_row
, which determine the start and end rows that act as the edges of the block, -
cube_hdu
, which is the cube created in_build_cube
, that is now being populated with the FFIs -
fill_info_table
, which is set to the booleanTrue
, and triggers that the image header table in the third extension (EXT=2
) gets written out, and -
verbose
, which will print out updates on the cube-making process when it is set toTrue
.
_write_block
is called within a for-loop, so it's iterated upon a number of times, depending on the number of blocks that are needed to write out the entirety of each FFI into the cube. As such, the start_row
and end_row
arguments will change for each iteration. For example: start_row
will be 0
for the first iteration, then 0 + size of the first block
for the second iteration, and so on, with end_row
also being updated accordingly.
This method stores the table created in _build_info_table
into a binary table extension in the cube FITS file. It assigns the correct data type for each column/image header keyword, and builds the table HDU from these columns. The table HDU is then appended to the existing cube file.
The TicaCubeFactory
class serves to take in a batch of TESS Image CAlibrator (TICA) High-Level Science Product full frame images and generate a 4-dimensional cube with them, which is then used by CutoutFactory
to generate a cutout TPF for a designated target or set of coordinates. The architecture for TicaCubeFactory
reflects CubeFactory
's with some minor but notable differences as a result of the difference in file structure between the TICA and SPOC products. Like with CubeFactory
, the main method of this class is make_cube
, but there is also an optional method called update_cube
which serves to update an existing cube file by appending new FFIs to it. This functionality exists because TICA FFIs are delivered multiple times throughout the course of a Sector, unlike their SPOC counterparts, which are delivered all at once after the end of the Sector cycle. Below is a comparison of make_cube
and update_cube
as it relates to the methods that each of them call.
Step Num. | make_cube |
update_cube |
---|---|---|
1 | _configure_cube |
_configure_cube |
2 | _build_info_table |
_build_info_table |
3 | _build_cube_file |
_write_block |
4 | _write_block |
_update_info_table |
5 | _write_info_table |
_write_info_table |
As you can see, there is no _build_cube_file
step under update_cube
since the existing cube has already been built. Moreover, there is a new method called update_info_table
which serves to rebuild the image header keyword table with the new FFIs. The table is rebuilt rather than appended-to because of the limitations of working with FITS tables. There is a strong case for modifying this step in order to improve processing time with update_cube
.
In the sections below, we will go through all the methods under TicaCubeFactory
(except make_cube
and update_cube
, which mainly call upon the others) and point out some of the differences from CubeFactory
that are not as straight-forward, but it will not be the same level of detail as the descriptions in CubeFactory
. For more descriptions of these methods, see the analogous sections under CubeFactory
.
The only notable differences between the initialization of TicaCubeFactory
and CubeFactory
are some of the header keyword declarations in the __init__
for TICA being analogous, but not identical to the SPOC version. For example, the CAMNUM
and CCDNUM
TICA keywords are analogous to the SPOC keywords CAMERA
and CCD
. There are also some attribute declarations that are helpful for the update_cube
functionality further down the class.
The main difference between this version of _configure_cube
is that the header keyword values are taken from the 0th extension, rather than the 1st extension. This is because for TICA FFIs, the actual science data and science-related keywords are stored in the primary HDU rather than on their own in an image HDU. In this method there is also some functionality for updating an already-existing cube_file
, and adding a header keyword called HISTORY
to document the iterations a cube file has gone through.
This has the same functionality as CubeFactory._build_info_table
.
This has the same functionality as CubeFactory._build_cube_file
, with some small differences, as noted here. It is only called upon for make_cube
.
Because the TICA pipeline which produces the TICA products does not propagate errors, the cube header created for the image HDU, is populated with an empty cube of size (100, 100, 10, 1), to contain only the science axis.
A notable difference in the TICA version of _write_block
is that a new header keyword called COMMENT
must be parsed, which is stored as a HeaderCommentaryCard
in the header. So it must be converted to a string object type before being stored in the info table.
This method rebuilds an existing info table with the new FFI information appended to it. This is done by concatenating the original table (og_table
) with the new-FFIs table created in _build_info_table
, column by column. Once each column is concatenated to its new version, it is turned into a Column
object and appended to the cols
list, which is then turned into an astropy.table.Table
object that is the newly-assigned info_table
to be written into the cube file in the next step.
This has the same functionality as CubeFactory._write_info_table
.