Project File Details


Original Author (Copyright Owner): 

3,000.00

Instant Download

Download your project material immediately after online payment.

100% Money Back Guarantee

File Type: MS Word (DOC) & PDF

File Size:  1,280KB

Number of Pages:57

 

ABSTRACT

 

There have been several improvement in object detection and semantic segmentation results in recent
years. Baseline systems that drives these advances are Fast/Faster R-CNN, Fully Convolutional Network
and recently Mask R-CNN and its variant that has a weight transfer function. Mask R-CNN is the
state-of-art. This research extends the application of the state-of-art in object detection and semantic
segmentation in drone based datasets. Existing drone datasets was used to learn semantic segmentation
on drone images using Mask R-CNN.
This work is the result of my own activity. I have neither given nor received unauthorized assistance on
this work.
Key words and phrases: instance segmentation, object detection, CNN, Mask R-CNN, drone programming,
computer

 

TABLE OF CONTENTS

 

1 Introduction 7
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2 Theoretical aspects of Image classification 10
2.1 Computer Vision Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.1.1 Image Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.1.2 Object Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.1.3 Semantic Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.1.4 Instance Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.2 CNN for Object Detection and Segmentation . . . . . . . . . . . . . . . . . . . . . . . 14
2.2.1 Convolutional Neural Networks (CNN) . . . . . . . . . . . . . . . . . . . . . . 14
2.2.2 Mask R-CNN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.3 Drone-Based Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3 Used Technologies 25
3.1 Python . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.2 TensorFlow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.3 Keras . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.4 Scikit-Image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4 Result and Analysis-Implementation 28
4.1 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.2 Mask R-CNN library . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.2.1 Config.py . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.2.2 Model.py . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.2.3 Utils.py . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.2.4 Drone.py . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.2.5 Drone-detect.py . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
5 Conclusion 46
A Principal program codes 47

 

CHAPTER ONE

 

Introduction
1.1 Introduction
Images and videos are collected everyday by different sources. Recognizing objects, segmenting localizing
and classifying them has been a major area of interest in computer vision. Significant progress has
been made commencing from use of low-level image features, such as scale invariant feature transform
SIFT [Lowe, 2004] and histogram of oriented gradients HOG [Dalal and Triggs, 2005] , in sophisticated
machine learning frameworks to the use of multi-layer convolutional networks to compute highly
discriminative, and invariant features [Girshick et al., 2015]. SIFT and HOG are feature descriptor and
semi-local orientation histograms that counts occurrences of gradient orientation in localized portions of
an image. Just as Convolutional Neural Network (CNN) is traced to the Fukushima’s “neocognitron”
[Krizhevsky et al., 2012], a hierarchical and shift-invariant model for pattern recognition, the use of CNN
for region-based identification (R-CNN)[Girshick et al., 2015] can also be traced back to the same. After
CNN was considered inelegant in the 1990s due to the rise of support vector machine (SVM), in 2012 it
was revitalize by [Krizhevsky et al., 2012] by demonstrating a valuable improvement in image classification
accuracy on the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) [Deng et al., 2012]
and included new mechanisms to CNN like rectified linear unit (ReLU) and, dropout regularization. To
perform object detection with CNN and in attempt to bridge the gap between image segmentation and
object detection two issues were fixed by [Girshick et al., 2015]. First was the localization of objects with
a Deep Network and training a high-capacity model with only a small quantity of annotated detection
data. Use of a sliding-window detector was proposed for the localization of object but was not preferred
because it can only work for one object detection and all object in an image has to have a common aspect
ratio for its use in multiple object detection. Instead the localization problem was solved by operating
within the “recognition using regions” paradigm.
Fast R-CNN was introduced in 2015 by Girshick [Ross, 2015]. A single-stage training algorithm
that jointly learns to classify object proposals and refine their spatial locations was demonstrated. This
tackled the problem of complexity that arises in other deep ConvNets [Krizhevsky et al., 2012][Girshick
et al., 2013][Zhu et al., 2015], caused by the multi-stage pipelines that are slow. The slow nature is
due to the fact that detection requires accurate localization of objects that creates the challenge of that
many proposals (candidate object locations) must be processed and these proposals provides only rough
localization that must be refined to achieve precise localization. Fast R-CNN is 9 X faster than R-CNN
7
1. CHAPTER: INTRODUCTION
[C] and 3 X faster than SPPnet [He et al., 2014]. R-CNN was speed up by Spatial pyramid pooling
networks (SPPnets)[He et al., 2014] by sharing computation. A convolutional feature map for the entire
input image was computed by SPPnet method. After which it then classifies each object proposal using a
feature vector extracted from the shared feature map. SPPnet also has obvious pitfalls. It is a multi-stage
pipeline similarly to R-CNN that involves extracting features, refining a network with log loss, training
SVMs, and lastly fitting bounding-box regressors. Features are also written to disk. But unlike RCNN,
the refining algorithm demonstrated in SPPnet cannot update the convolutional layers that precede
the spatial pyramid pooling. This constraint limits the accuracy of very deep networks. Additional
efforts were made to reduce the running time of deep ConvNets for object detection and segmentation.
Regional proposal computation is the root of this expensive running time in detection networks. A fully
convolutional network that simultaneously predicts object bounds and objectness scores at each position
called Region Proposal Network (RPN) was developed by Ren et al [Ren et al., 2015]. RPN shares
full-image convolutional features with the detection network, thus permitting virtually cost-free region
proposals and it is trained end-to-end to generate high-quality region proposals. Integrating RPN and
Fast R-CNN into a unit network by sharing their convolutional features results to Faster R-CNN. Anchor
boxes that acts as reference at multiple scales and aspect ratios were introduced in Faster R-CNN instead
of the pyramids of filters used in earlier methods. RPNs are developed to coherently speculate region
proposals with an extensive range of scales and aspect ratios. Changing the architecture of the pyramids
of filter to a top-down architecture with lateral connections improved the efficiency of this pyramids
[Lin et al., 2016]. This is applied in building high-level semantic feature maps at all scales. This new
architecture is called Feature Pyramid Network (FPN) [Lin et al., 2016] . In various applications
and uses it displayed a notable improvement as a generic feature extractor. When used in a Faster RCNN
it achieved results that supersedes that of Faster R-CNN alone. In order to generate a high-quality
segmentation mask for object instances in an image, Mask R-CNN was developed [He et al., 2017].
Mask R-CNN add another branch to the Faster R-CNN. In addition to the bounding box recognition
system a branch for predicting an object mask in parallel was added. It affixes only a bijou overhead to
Faster R-CNN, running at 5 fps.
The accessibility and use of drone technology is at the increase currently. It is tackling challenges in
various spheres and areas like defence, shipping of consumer goods, disease controls, events coverage
and so on. One of the most important application of drone is for collection of images and videos. These
data collected can be used for different purposes. This work will extend the state-of-the-art Mask R-CNN
for segmentation of objects in image instances collected by a drone. It detects about 22 classes including
tree, grass, other vegetation, dirt, grave, rocks, water, paved area, pool, person, dog, car, bicycle, roof,
wall, fence, fence-pole, window, door, and obstacle. For the training of the model high resolution images
at 1Hz with pixel-accurate annotation was used.
In Chapter 2 of this work will discuss theory of CNN and Mask RCNN deeply. The first part will discuss
the backbone of Mask RCNN, followed by Regional Proposal Network, ROI Classifier and Bounding
Box Regressor and lastly Segmentation Mask. Chapter 3 will discuss fully segmentation on drone
dataset. Chapter 4 will explore the the technlogies used and in chapter 5 the implementation of the work
8
1. CHAPTER: INTRODUCTION
will be explained fully with the aid of [Presek and Landa, 2018].

GET THE FULL WORK

DISCLAIMER: All project works, files and documents posted on this website, projects.ng are the property/copyright of their respective owners. They are for research reference/guidance purposes only and the works are crowd-sourced. Please don’t submit someone’s work as your own to avoid plagiarism and its consequences. Most of the project works are provided by the schools' libraries to help in guiding students on their research. Use it as a guidance purpose only and not copy the work word for word (verbatim). If you see your work posted here, and you want it to be removed/credited, please call us on +2348157165603 or send us a mail together with the web address link to the work, to hello@projects.ng. We will reply to and honor every request. Please notice it may take up to 24 or 48 hours to process your request.