Skip to content

Overview of `make_cube.py`

C. E. Brasseur edited this page Sep 1, 2023 · 31 revisions

Introduction

The make_cube.py script houses two classes: CubeFactory and TicaCubeFactory. Both of these classes serve to generate a cube from a batch of TESS full frame images (FFIs) that correspond to a single TESS sector/camera/CCD observation. There are two different product (FFI) types that you can generate cubes from that are structurally different, which is why there is one class for each FFI type: CubeFactory will allow you to generate a cube from a batch of Mission-delivered FFIs (SPOC FFIs) and TicaCubeFactory will allow you to generate a cube from a batch of TESS Image CAlibrator FFIs (TICA High-Level Science Product FFIs). There are advantages to using each type, which are outlined in the TESSCut API documentation. This wiki page will walk you through the make_cube.py functionality

image

CubeFactory

The CubeFactory class serves to take in a batch of mission-delivered (SPOC) full frame images and generate a 4-dimensional cube with them, which is then used by CutoutFactory to generate a cutout target pixel file (TPF) for a designated target or set of coordinates. The main method of this class is make_cube, which calls upon the other methods to generate the cube and store it into a FITS file. The order upon which the methods are called is as follows:

  1. __init__
  2. _configure_cube
  3. _build_info_table
  4. _build_cube_file
  5. _write_block
  6. _write_info_table

The functionality of each of these methods will be discussed briefly in the sections following. Should there be any questions/comments regarding this wiki page, please file an issue under the Issues tab in this repository or contact us through the Archive Help Desk at [email protected].

__init__

This is a standard initialization function within a class that instantiates a certain set of variables as part of the class for future calls later in the code. The max_memory argument is the allocated memory (in GB) that can be used when generating the cube. The smaller this number, the longer it will take to generate the cube, but this number will depend on the specifications of your machine. We suggest leaving this at the default 50 GB unless it causes computational errors.

The attributes block_size and num_blocks are dependent on the input max_memory. This is because the FFIs are not read into the cube all at once, but rather in chunks or "blocks". The block size, and therefore the number of blocks, is calculated based on the max_memory value. More detail on this is given under _configure_cube, where these attributes are assigned values.

The remaining attributes are things such as keyword headers, or variables that will be called across multiple functions such as cube_file.

_configure_cube

This method will determine the block size (a subset of the full FFI) and the number of blocks to iterate through, based on the input max_memory from __init__. The largest possible block size, or max_block_size is determined by taking the max_memory (converted to bytes from GB) and dividing it by the slice_size, which is the # of columns in the image multiplied by the # of FFIs in the batch * 2 * 4. [VERIFY: CEB: YUP] We multiply by 2 because each cube will have 2 "layers" (axes): 1 axis dedicated to the science data and 1 axis dedicated to the error data, and then multiply by 4 because there are approximately [CEB: I don't think it's aproximate, I think it's 64bit floats....] 4 bytes per pixel. After this calculation is done, the number of blocks is determined by taking the # of rows in the FFIs and dividing by the slice_size, adding 1, and rounding down to the nearest integer ([VERIFY: YUP] we add 1 because np.int always rounds down).

This is also the method that will assign the template_file variable. This is the FFI whose image header keywords are used as the template to initialize the image header table in the later steps.

Lastly, this method also makes the cube's primary header. To do this, we copy the primary header keywords and keyword values from the first FFI in the stack, to the cube file's primary header. Extra header keywords are added such as the factory keywords, and lastly, some primary header keywords are populated with the keyword values from the last FFI in the stack, to indicate the time span of the observations included in the cube. These keywords are DATE-END and TSTOP for the SPOC cubes, and ENDTJD for TICA cubes.

_build_info_table

With the template file that is defined in _configure_cube, this method sets up the columns of the image header table by making a column for each image header keyword in the FFI (and thus all the FFIs) and assigning it the correct data type. If you use TicaCubeFactory and receive an Info Warning: Card too long message, this is the piece of code that is the culprit. There is a keyword in the TICA header called COMMENT which contains a comment on the TICA processing that may have different lengths depending on the FFI. When the template file chosen for that stack contains a COMMENT keyword value string that is shorter than any of the FFIs that follow, this warning message will trigger. It does not appear to clip the COMMENT keyword values, so it is not something to worry about currently.

Once the column objects are defined, they are appended to a cols list which is then turned into an astropy.table.Table object, and assigned to the info_table variable.

_build_cube_file

This method builds the cube file and writes it into a file with the name designated via the cube_file argument in the call. It writes in the primary header that was created in the _configure_cube step, and initializes the HDU list with just the primary HDU and writes it into the file. It then creates the cube header for the image HDU, first populating the data side of the HDU with an empty cube of size (100, 100, 10, 2), and then populates the header with some keywords defining the dimensions of the cube. The SPOC pipeline does propagate errors in the production of the FFIs, so the science and error axes are both included in this cube.

Next, it expands the files size so that it can accommodate all the FFIs that will be part of the cube. To do this, the total cube size (cubesize_in_bytes) must be calculated. The size is calculated by taking the area of the FFIs, and multiplying it by 4 which is the estimated [CEB: again, I don't think this is an estimate...] byte size of 1 pixel. We add 2880 to this value because FITS requires all blocks to be a multiple of 2880, and then subtract by 1 because [VERIFY: CEB: not this true actually, see pt 2 in the email.] numpy arrays are zero-indexed. All of that is integer divided by 2880, and then multiplied by 2880 once again.

The last step in this method is writing a null byte. It takes the total size in bytes of the cube and seek [CEB: past the end of the file to the size we want the file to be.] to the end of the file to write a null byte (CUBE.write(b'\0')) because [VERIFY: CEB: Writing this byte is what actually grows the file, up to this point we've done all the calculations but the file is just sitting there as is. By seeking past the end of the file and writing a single byte we expand the file to the sise we have calculated.]????

_write_block

From here we begin the iterating process of writing the FFIs into the cube. The cube created with _build_cube is opened and fed into _write_block along with a few other arguments of interest:

  1. start_row and end_row, which determine the start and end rows that act as the edges of the block,
  2. cube_hdu, which is the cube created in _build_cube, that is now being populated with the FFIs
  3. fill_info_table, which is set to the boolean True, and triggers that the image header table in the third extension (EXT=2) gets written out, and
  4. verbose, which will print out updates on the cube-making process when it is set to True.

_write_block is called within a for-loop, so it's iterated upon a number of times, depending on the number of blocks that are needed to write out the entirety of each FFI into the cube. As such, the start_row and end_row arguments will change for each iteration. For example: start_row will be 0 for the first iteration, then 0 + size of the first block for the second iteration, and so on, with end_row also being updated accordingly.

_write_info_table

This method stores the table created in _build_info_table into a binary table extension in the cube FITS file. It assigns the correct data type for each column/image header keyword, and builds the table HDU from these columns. The table HDU is then appended to the existing cube file.

TicaCubeFactory

The TicaCubeFactory class serves to take in a batch of TESS Image CAlibrator (TICA) High-Level Science Product full frame images and generate a 4-dimensional cube with them, which is then used by CutoutFactory to generate a cutout TPF for a designated target or set of coordinates. The architecture for TicaCubeFactory reflects CubeFactory's with some minor but notable differences as a result of the difference in file structure between the TICA and SPOC products. Like with CubeFactory, the main method of this class is make_cube, but there is also an optional method called update_cube which serves to update an existing cube file by appending new FFIs to it. This functionality exists because TICA FFIs are delivered multiple times throughout the course of a Sector, unlike their SPOC counterparts, which are delivered all at once after the end of the Sector cycle. Below is a comparison of make_cube and update_cube as it relates to the methods that each of them call.

Step Num. make_cube update_cube
1 _configure_cube _configure_cube
2 _build_info_table _build_info_table
3 _build_cube_file _write_block
4 _write_block _update_info_table
5 _write_info_table _write_info_table

As you can see, there is no _build_cube_file step under update_cube since the existing cube has already been built. Moreover, there is a new method called update_info_table which serves to rebuild the image header keyword table with the new FFIs. The table is rebuilt rather than appended-to because of the limitations of working with FITS tables. There is a strong case for modifying this step in order to improve processing time with update_cube.

In the sections below, we will go through all the methods under TicaCubeFactory (except make_cube and update_cube, which mainly call upon the others) and point out some of the differences from CubeFactory that are not as straight-forward, but it will not be the same level of detail as the descriptions in CubeFactory. For more descriptions of these methods, see the analogous sections under CubeFactory.

__init__

The only notable differences between the initialization of TicaCubeFactory and CubeFactory are some of the header keyword declarations in the __init__ for TICA being analogous, but not identical to the SPOC version. For example, the CAMNUM and CCDNUM TICA keywords are analogous to the SPOC keywords CAMERA and CCD. There are also some attribute declarations that are helpful for the update_cube functionality further down the class.

_configure_cube

The main difference between this version of _configure_cube is that the header keyword values are taken from the 0th extension, rather than the 1st extension. This is because for TICA FFIs, the actual science data and science-related keywords are stored in the primary HDU rather than on their own in an image HDU. In this method there is also some functionality for updating an already-existing cube_file, and adding a header keyword called HISTORY to document the iterations a cube file has gone through.

_build_info_table

This has the same functionality as CubeFactory._build_info_table.

_build_cube_file

This has the same functionality as CubeFactory._build_cube_file, with some small differences, as noted here. It is only called upon for make_cube.

Because the TICA pipeline which produces the TICA products does not propagate errors, the cube header created for the image HDU, is populated with an empty cube of size (100, 100, 10, 1), to contain only the science axis.

_write_block

A notable difference in the TICA version of _write_block is that a new header keyword called COMMENT must be parsed, which is stored as a HeaderCommentaryCard in the header. So it must be converted to a string object type before being stored in the info table.

_update_info_table

This method rebuilds an existing info table with the new FFI information appended to it. This is done by concatenating the original table (og_table) with the new-FFIs table created in _build_info_table, column by column. Once each column is concatenated to its new version, it is turned into a Column object and appended to the cols list, which is then turned into an astropy.table.Table object that is the newly-assigned info_table to be written into the cube file in the next step.

_write_info_table

This has the same functionality as CubeFactory._write_info_table.