Vehicle Detection Project
The goals / steps of this project are the following:
- Perform a Histogram of Oriented Gradients (HOG) feature extraction on a labeled training set of images and train a classifier Linear SVM classifier
- Optionally, you can also apply a color transform and append binned color features, as well as histograms of color, to your HOG feature vector.
- Note: for those first two steps don't forget to normalize your features and randomize a selection for training and testing.
- Implement a sliding-window technique and use your trained classifier to search for vehicles in images.
- Run your pipeline on a video stream (start with the test_video.mp4 and later implement on full project_video.mp4) and create a heat map of recurring detections frame by frame to reject outliers and follow detected vehicles.
- Estimate a bounding box for vehicles detected.
Rubric Points
###Here I will consider the rubric points individually and describe how I addressed each point in my implementation.
###Writeup / README
####1. Provide a Writeup / README that includes all the rubric points and how you addressed each one. You can submit your writeup as markdown or pdf. Here is a template writeup for this project you can use as a guide and a starting point.
You're reading it!
###Histogram of Oriented Gradients (HOG)
####1. Explain how (and identify where in your code) you extracted HOG features from the training images.
The code for extracted HOG features is in the line 68-79 of file "utils.py"
# Define a function to return HOG features and visualization
def get_hog_features(img, orient, pix_per_cell, cell_per_block, vis=False, feature_vec=True):
if vis == True:
features, hog_image = hog(img, orientations=orient, pixels_per_cell=(pix_per_cell, pix_per_cell),
cells_per_block=(cell_per_block, cell_per_block), transform_sqrt=False,
visualise=True, feature_vector=False)
return features, hog_image
else:
features = hog(img, orientations=orient, pixels_per_cell=(pix_per_cell, pix_per_cell),
cells_per_block=(cell_per_block, cell_per_block), transform_sqrt=False,
visualise=False, feature_vector=feature_vec)
return features
I started by reading in all the vehicle
and non-vehicle
images. Here is an example of one of each of the vehicle
and non-vehicle
classes:
I then explored different color spaces and different skimage.hog()
parameters (orientations
, pixels_per_cell
, and cells_per_block
). I grabbed random images from each of the two classes and displayed them to get a feel for what the skimage.hog()
output looks like.
Here is an example using the RGB
color space and HOG parameters of orientations=8
, pixels_per_cell=(8, 8)
and cells_per_block=(2, 2)
:
####2. Explain how you settled on your final choice of HOG parameters.
I tried various combinations of parameters to extract the features and use these features to train svm classfier, and pick the one that is perform best(have the highest test score)
Here is my final parameters:
colorspace = 'YUV' # Can be RGB, HSV, LUV, HLS, YUV, YCrCb
orient = 11
pix_per_cell = 16
cell_per_block = 2
hog_channel = 'ALL' # Can be 0, 1, 2, or "ALL"
####3. Describe how (and identify where in your code) you trained a classifier using your selected HOG features (and color features if you used them). The code for trained a classifier is in the file "train.py"
I trained a linear SVM using the default parameter with only the hog features, and get a accuracy of 0.98 on the test data set.
###Sliding Window Search
####1. Describe how (and identify where in your code) you implemented a sliding window search. How did you decide what scales to search and how much to overlap windows?
The code for implemented a sliding window search is in the line 25-192 of file "pipeline.py"
I try out a lot of searching tatics and choose the conbination of 4 level sliding window:
The level 1 have sliding window with 0.75 overlap and scales of 1.0, the sliding window look like:
The level 2 have sliding window with 0.75 overlap and scales of 1.5, the sliding window look like:
The level 3 have sliding window with 0.75 overlap and scales of 2.0, the sliding window look like:
The level 4 have sliding window with 0.75 overlap and scales of 3.5, the sliding window look like:
####2. Show some examples of test images to demonstrate how your pipeline is working. What did you do to optimize the performance of your classifier?
Extracting the hog features is time comsuming, for the performance of the classifier, I define the find_car method to etract the hog features once for each image, and use sub-sampling window to get the hog features to feat on the classifier.
def find_cars(img, ystart, ystop, scale, cspace, hog_channel, svc, X_scaler, orient,
pix_per_cell, cell_per_block, spatial_size, hist_bins, show_all_rectangles=False):
# array of rectangles where cars were detected
windows = []
img = img.astype(np.float32) / 255
img_tosearch = img[ystart:ystop, :, :]
# apply color conversion if other than 'RGB'
if cspace != 'RGB':
if cspace == 'HSV':
ctrans_tosearch = cv2.cvtColor(img_tosearch, cv2.COLOR_RGB2HSV)
elif cspace == 'LUV':
ctrans_tosearch = cv2.cvtColor(img_tosearch, cv2.COLOR_RGB2LUV)
elif cspace == 'HLS':
ctrans_tosearch = cv2.cvtColor(img_tosearch, cv2.COLOR_RGB2HLS)
elif cspace == 'YUV':
ctrans_tosearch = cv2.cvtColor(img_tosearch, cv2.COLOR_RGB2YUV)
elif cspace == 'YCrCb':
ctrans_tosearch = cv2.cvtColor(img_tosearch, cv2.COLOR_RGB2YCrCb)
else:
ctrans_tosearch = np.copy(img)
# rescale image if other than 1.0 scale
if scale != 1:
imshape = ctrans_tosearch.shape
ctrans_tosearch = cv2.resize(ctrans_tosearch, (np.int(imshape[1] / scale), np.int(imshape[0] / scale)))
# select colorspace channel for HOG
if hog_channel == 'ALL':
ch1 = ctrans_tosearch[:, :, 0]
ch2 = ctrans_tosearch[:, :, 1]
ch3 = ctrans_tosearch[:, :, 2]
else:
ch1 = ctrans_tosearch[:, :, hog_channel]
# Define blocks and steps as above
nxblocks = (ch1.shape[1] // pix_per_cell) + 1 # -1
nyblocks = (ch1.shape[0] // pix_per_cell) + 1 # -1
nfeat_per_block = orient * cell_per_block ** 2
# 64 was the orginal sampling rate, with 8 cells and 8 pix per cell
window = 64
nblocks_per_window = (window // pix_per_cell) - 1
cells_per_step = 2 # Instead of overlap, define how many cells to step
nxsteps = (nxblocks - nblocks_per_window) // cells_per_step
nysteps = (nyblocks - nblocks_per_window) // cells_per_step
# Compute individual channel HOG features for the entire image
hog1 = utils.get_hog_features(ch1, orient, pix_per_cell, cell_per_block, feature_vec=False)
if hog_channel == 'ALL':
hog2 = utils.get_hog_features(ch2, orient, pix_per_cell, cell_per_block, feature_vec=False)
hog3 = utils.get_hog_features(ch3, orient, pix_per_cell, cell_per_block, feature_vec=False)
for xb in range(nxsteps):
for yb in range(nysteps):
ypos = yb * cells_per_step
xpos = xb * cells_per_step
# Extract HOG for this patch
hog_feat1 = hog1[ypos:ypos + nblocks_per_window, xpos:xpos + nblocks_per_window].ravel()
if hog_channel == 'ALL':
hog_feat2 = hog2[ypos:ypos + nblocks_per_window, xpos:xpos + nblocks_per_window].ravel()
hog_feat3 = hog3[ypos:ypos + nblocks_per_window, xpos:xpos + nblocks_per_window].ravel()
hog_features = np.hstack((hog_feat1, hog_feat2, hog_feat3))
else:
hog_features = hog_feat1
xleft = xpos * pix_per_cell
ytop = ypos * pix_per_cell
test_prediction = svc.predict(hog_features)
if test_prediction == 1 or show_all_rectangles:
xbox_left = np.int(xleft * scale)
ytop_draw = np.int(ytop * scale)
win_draw = np.int(window * scale)
windows.append(
((xbox_left, ytop_draw + ystart), (xbox_left + win_draw, ytop_draw + win_draw + ystart)))
return windows
Ultimately I searched on 4 scales using YUV 3-channel HOG features, which provided a nice result. Here are the result on the test image:
####1. Provide a link to your final video output. Your pipeline should perform reasonably well on the entire project video (somewhat wobbly or unstable bounding boxes are ok as long as you are identifying the vehicles most of the time with minimal false positives.)
Here's a link to my video result
####2. Describe how (and identify where in your code) you implemented some kind of filter for false positives and some method for combining overlapping bounding boxes.
I recorded the positions of positive detections in each frame of the video. From the positive detections I created a heatmap and then thresholded that map to identify vehicle positions. I then used scipy.ndimage.measurements.label()
to identify individual blobs in the heatmap. I then assumed each blob corresponded to a vehicle. I constructed bounding boxes to cover the area of each blob detected.
Here's an example result showing the heatmap from the test images, the result of scipy.ndimage.measurements.label()
and the bounding boxes that overlaid on the images:
Here is the output of scipy.ndimage.measurements.label()
on the integrated heatmap:
Here the resulting bounding boxes are drawn onto the test images:
###Discussion
####1. Briefly discuss any problems / issues you faced in your implementation of this project. Where will your pipeline likely fail? What could you do to make it more robust?
The main problems I face in your implementation of this project is find the right combination of sliding window search tatics. It took me a lot of time to try difference tatics and get the final result.
My pipeline may fail on condition where a lot of car on the street, where the positive detections boxes on each car will probably pile up on each other. And in the condition like fog, rain where the shape of a car can't be properly capture, my classifier may not capable to make the right prediction.
Use the both the color features and hog features to train the classifer could improve the accuracy of prediction under more complicate condition. And make my pipeline more robust.