LBL pipeline DECAT alerts

LBL Pipline DECAT Alerts

The LBL pipeline subtracts images and finds sources. It sends out AVRO alerts to a Kafka broker. Direct questions about it to Rob Knop on the #pipeline channel of the DECAT Slack, or to raknop@lbl.gov.

Alert brokers and topics
Alerts and schemas
Stream quality
Image access
About the pipeline
References
Caveats and considerations

Alert brokers and topics

The following brokers are receiving alerts. Consume alerts from there if you want to receive them. If you have a broker and you also want to receive alerts, email Rob at the address above.

public.alerts.ztf.uw.edu:9092

The topic for alerts will be decat_{yyyymmdd}_{propid}, where yyyymmdd is the calendar date of the evening of the observations (matching the "caldat" field of the NOIRlab API). propid is the proposal id. As of this writing (2021-04-20), we're only subtracting the Graham et al. images, so the propid will be 2021A-0113. However, if the fields are in places we have references (see "References" below), we will be happy to subtract and search other projects' data as well.

Alerts and schemas

Alert schemas are embedded with the alerts. You can find the schemas at https://github.com/rknop/decat_schema. As of this writing (2021-04-20) the current schema version is 0.14. If the schemas change, the version number will increase.

Each alert will be an "object" alert, defined by the decat_object.avsc schema. This alert has the information about an objet that has been identified by the pipeline. Ideally, one object is a transient at a single RA/Dec, and each transient (within 2 arcseconds) will only ever be identified as one object, with the "objectid" being a unique identifier of that object. In practice, sometimes the pipeline will identify multiple objects right next to each other, as a result of artifacts of subtractions near very bright / saturated objects. (See "stream quality" below.)

Embedded in the "object" alert are one or more "source" fields (which are defined by the decat_source.avsc schema). A "source" is one detection of the object on a subtraction. One object (if it's real) will have multiple sources as it is redetected in different images. (The nomenclature is inspired by proposed LSST alert schema.)

Each time the pipeline detects an object on a subtraction, it will resend the "object" alert for that object. In the field "triggersource" will be the source that triggered this particular alert. It will include gzipped 51×51 FITS cutouts of the search image, the subtraction template, and the difference image around the source. Every source that the pipeline has previously detected at the location of this candidate will be included in the "sources" array. To keep the alert size manageable, the elements of this array will not have the FITS cutouts.

Stream quality

The quality of the stream is far from perfect. It's all automated, so it does not benefit from multiple trained observers looking at the images to decide if the detections look like real sources or subtraction artifacts. Ideally, the ML system (see "About the pipeline" below) replaces this, but as in all cases of ML, if your training set is not utterly identical (statistically speaking) to the things you're going to ask it to classify, the classifications will not be nearly as good as what you measured from your test set (which was presumably statistically identical to the training set).

The current cutoff we're using with the ML system will, we believe, include the vast majority of real sources in the subtractions. It will also include some subtraction artifacts. Because of both of these things, don't treat the results of the search stream as a complete and cleaned sample of anything. "Real" sources include anything that looks like a PSF, and not a subtraction artifact. A lot of these will be asteroids (which will usually be a source that's present in the science image at the position of a blank spot in the template image). Ideally, they will also include variable stars, variable AGN, supernovae, and other real astronomical transients. However, the stream will also undersubtracted stars (as a result of nonlinearities, saturation, or other imperfections) as real sources.

Inevitably, however, the stream will include some sources that are just subtraction artifacts. Our hope is that these are a small fraction (a few percent) of the "real" sources in the stream.

We will probably continue to tweak the pipeline over the coming months. Ideally, we can improve the subtractions in order to reduce subtraction artifacts. We may also try to re-train the ML system on images from this search in an attempt to improve its accuracy.

Image access

Embedded in the "source" fields of the alert are 51×51 cutouts of the images around the detected source. The the three relevant fields are scicutout, refcutout, and diffcutout. The first one is a cutout of the "search" image, the image taken as part of the project that the pipeline was processing. The second is the image that was used as a subtraction reference (see "References" below). The third is the result of the PSF-matched image subtraction (subtracted with hotpants) (I didn't make up that name, don't blame me). As of schema version 0.14, these are gzipped FITS images, with a very abbreviated header. A future version of the schema may change the compression from gzip to fpack.

The name of the raw science image as found on the NOIRLab Astro Data Archive is in the field expname of the "source" schema. If it's your project, or if the data is immediately public (which is the case for 2021A-0113), you can always get the original images from there.

There are also three fields in the source schema named *url. These correspond to the *cutout fields, and are urls to where the full reduced science image, warped reference image (aligned with scamp and swarp to the science image), and subtraction. These images will not be immediately available. As of this writing, hand intervention is still necessary to get the images from where the pipeline runs to where they will sit on the web server. As such, view these url fields as being the url where we eventually intend to make the images available. Also, see "Caveats" below.

About the pipeline

This is not intended to be a complete description of the pipeline.

The lbl pipeline grabs images from the NOIRlab data server. Each time it finds a new image, it processes it. It registers the WCS using Gaia, and determines zeropoints by comparing photometry of the image (from sextractor) to photometry of the DES, DECaLS, or DECaPS survey. It then looks for images to use as subtraction references in the those same surveys surveys. It uses scamp and swarp to align the subtraction reference (which is in general much deeper than the science image) with the science image, then psf matches and subtracts with hotpants. The pipeline uses sextractor to search the resultant difference image for residual sources. It makes some basic cuts on those sources, and then feeds 51×51 cutout images to a machine learning (ML) system developed by Venkitesh Ayyar to produce a "real/bogus" score (which shows up as rb in the "source" alerts). Higher numbers indicate that the ML system has more confidence that this is a "real" source. (Real sources, ideally, include anything that is psf-like in the subtraction. This includes supernovae, variable stars, variable AGN, asteroids, but also undersubtracted bright/saturated stars. However, see "Stream Quality" above.) The pipeline then sends out an alert for each source it finds with rb above some cutoff (currently, as of this writing, set to 0.6).

References

The pipeline currently only looks for references in a couple of different places. It uses them for two purposes: photometric calibration, and subtraction templates. So that the magnitudes in the alerts have some meaning, it needs references that have been well photometrically calibrated; for this reason, it won't search the whole NOIRLab data archive for images of the same field. If a science image isn't on a region of the sky that has a reference in the survey's we search, the pipeline will punt and not subtract or search that science image. It will also punt if there are no references in the same filter as the science image. (For instance, DECaLS has no i images.)

The surveys that the pipeline currently uses for references are:

DES DR1 (pulling images from desdr-server.ncsa.illinois.edu/despublic/dr1_tiles)
DECaLS DR8 (pulling images from portal.nersc.gov/cfs/cosmo/data/legacysurvey/dr8/south)
DECaPS DR1 (pulling images from portal.nersc.gov/cfs/cosmo/data/decaps/dr1)

For the current search (i.e. DECAT), only the DECaLS references have been used in our tests. The DES references were used in a previous version of the pipeline for a previous project, so at the moment all we have is hope that they will work right....

Hopefully soon, the pipeline will be updated to use DES DR2 and DECaLS DR9. Also, we intend to add the DECaPS survey for the galactic fields of 2021A-0113.

Caveats and considerations

In no particular order:

See Stream quality above.
Inevitably, redundant sources will be included in the alerts, for a variety of reasons. As we find and fix new failure modes in the pipeline, we may re-run some images through the pipeline. This will result in a new "source" that has its own unique sourceid, but that is the same thing that the pipeline already found from that same image. Look at the "expname" field of the "source" parts of the alert; if the same expname shows up more than once, then that's a case where we ran the same image through the pipeline a second time and it redetected something it had a already detected.
At the moment, every time a new image shows up in the NOIRLab server, the pipeline subtracts and searches it. We intend to make a second version of the pipeline that would run after the end of each night, which will coadd all images of the same field before subtracting and searching. This second pipeline will catch more sources, as it will of course be deeper than when each new image image is searched independently. Obviously, these stack detections will not be independent from detections in the images that went into the stack! Stack searching is not currently implemented, however, so there will be no alerts from image stacks.
One way to decide if a source is real is if it shows up on multiple different images, or on subsequent nights. There are a number of caveats about this. The pipeline will only include in the "source" field of the alert things that it detected in its subtractions. As such, if it detects an object on a good seeing night, in the alert it will not necessarily include observations of that same object from other images. The object may well have been in those other images, but the S/N was too low for the pipeline to detect it.
Because of the way the pipeline defines objects, asteroids will show up as multiple different objects. When the asteroid moves between different search images, it will be in a new place, so the pipeline will consider it a new source. If you are interested in asteroids, additional thought will be required to make use of the alert stream.
If we start running the pipeline on projects that do not immediately make their data public, we are going to have to think about how to handle the *url fields of the alerts, as we won't want to put the reduced, warped, and subtracted images from those projects on the web server.