Jack Kwok is a Software Engineer with 15 years of professional experience. At Insight, he built a Deep Learning solution to automatically detect flooded roads during natural disasters. He is now a Software Engineer at Lyft working with Machine Learning and Deep Learning.
Want to learn applied Artificial Intelligence from top professionals in Silicon Valley or New York? Learn more about the Artificial Intelligence program.
Flooded roads are a huge risk
With global climate change, devastating hurricanes are occurring with higher frequency.
After a hurricane, roads are often flooded or washed out, making them treacherous for motorists. According to The Weather Channel, almost two of every three U.S. flash flood deaths from 1995–2010, excluding fatalities from Hurricane Katrina, occurred in vehicles.
During my Insight A.I. Fellowship, I designed a system that detects flooded roads and created an interactive map app. Using state of the art computer vision deep learning methods, the system automatically annotates flooded, washed out, or otherwise severely damaged roads from satellite imagery.
Using Deep Learning to detect danger
Most modern machine learning techniques require labeled data. Manually labelling damaged roads on thousands of satellite map tiles would be time consuming and costly. Instead, I trained a robust, highly accurate road segmentation model from readily available satellite and street mapping data of roads in good conditions.
Then I used my model to compare road segmentation on pre-flood satellite imagery against post-flood satellite imagery to detect road changes or anomalies. With an accurate road segmentation algorithm, the difference between the post-flood segmentation and pre-flood segmentation can make a good detector for flooded roads.
The final algorithm works in 4 simple steps, outlined below.
- Generate road segmentation mask PreM from pre-hurricane satellite image tile.
- Generate road segmentation mask PostM from post-hurricane satellite image tile over the same area.
- Generate difference DiffM by subtracting PostM from PreM.
- Filter DiffM with ground truth street mask (derived from MapBox street map) to remove model prediction noise.
Finally, to generate the annotated tile overlay, the non-zero-valued pixels from the last step are shaded with opaque red while the zero-valued pixels are made fully transparent so the layer can be rendered on top of any base map for presentation.
Choosing relevant training data
To train the road segmentation model, high quality satellite imagery and street maps from MapBox were used. Geographic locations were chosen to represent a diverse mix of street layouts, road types, and street density across the United States. The training data set consists of satellite and street map tile pairs from Boston, New York City, Atlantic City, Tampa, Miami, New Orleans, and L.A.
I chose to exclude the Houston area from the training set to ensure the final segmentation model has no knowledge about Houston. We will eventually be testing the model on the Houston area affected by Hurricane Harvey, which will provide evidence for model generalizability.
A total of 9,000 satellite tile and street mask tile pairs were used in the training set. Each map tile has dimension 512×512.
After randomization, I split the full data set into train set and validation set following a 80:20 split. I use early stopping to make sure my model does not overfit to the training data (i.e. training stops when the validation loss stops decreasing).
Picking the right model
As with image classification, convolutional neural networks (CNN) have had enormous success on image segmentation. For this project we experimented with variations of a state-of-the-art CNN model called the U-Net, originally invented for biomedical image segmentation.
The input to the U-Net is the 3 channel (RGB) satellite image array of size 512x512x3. I decided against downsampling the input images for two reasons: 1. Narrow roads become harder to detect with downsampling. 2. I aimed to train a single robust model that is able to take full advantage of high resolution satellite imagery but still can perform reasonably well on low resolution imagery. The model output is the binary segmentation mask of size 512×512. A pixel value of 1 indicates the pixel belongs to a road whereas a value of 0 indicates it does not belong to a road.
Since the network is essentially performing binary classification (road pixels vs non-road pixels) for each pixel of the input image, sigmoid activation is used as the final output layer of the U-Net.
I chose the cost function based on the negative of the dice coefficient:
The smooth variable prevents division by zero.
Adam, the model optimizer, was initialized with a default learning rate (0.0001) and the optimizer automatically reduces learning rate by a factor of ten when the cost function for the validation set fails to decrease for two consecutive epochs. Training stops when no improvement is observed over 3 epochs.
After training different variants, I eventually settled on U-Net with dilated convolutions. The benefit of dilated convolution is it increases the receptive field so that large contextual features are recognized. When multiple dilated convolutional layers are stacked together, it creates an exponential expansion of receptive field. Roads typically span across a map tile so intuitively a wider receptive field should be advantageous. Empirically, dilation slightly improves the dice coefficient.
While the model inference performance on the MapBox validation set is excellent (dice coefficient > 0.7), the model initially performed badly on post hurricane images. Upon visual inspection, I concluded the issue was due to difference in image quality. The training images were essentially taken under ideal atmospheric condition (cloud free and blur free) whereas post hurricane images are slightly blurry and noisy.
To address that deficiency, I applied Gaussian blur to a randomly chosen subset of the training images to simulate variance in image qualities and re-trained the model. The re-trained model performed significantly better on post hurricane imagery because it is robust against variations in image quality/resolution.
In addition, to increase the size of our training data set, each satellite/street map pair is randomly flipped and rotated at right angles to generate 7 additional variants of the image. Augmentation boosted the dice coefficient by approximate 0.03. Image augmentation is executed in batches on the CPU in parallel with model training on the GPU to optimize training time.
A summary of results can be found below. Please note that the best model on hurricane images (second to last row) does not obtain the best Dice score on the validation set, since we only have clean satellite images available for validation.
Conclusion and future directions
- The U-Net model was originally invented for biomedical image segmentation but it has proven to be highly applicable to satellite imagery.
- The accuracy of the final annotation greatly depends on the performance of the road segmentation model. My road segmentation model can be further improved by training with a more diverse dataset (particularly, images from a more diverse set of satellite sensors with different resolutions and diverse geographies).
- Due to the lack of an annotated dataset and the high cost to create one (i.e. by manually annotating flooded roads on satellite imagery using a crowdsourcing service.), I was not able to quantitatively evaluate the accuracy of the final annotation. But visual inspection shows the approach yields very good results.
Finally, the generated map annotation can be used to quickly highlight post disaster (hurricanes, landslide, earthquakes etc) road anomalies on a map. I am open sourcing this work to allow others to build upon it to make a production ready product.