Toy Dataset
I think RetinaNet is a good starting point, for RetinaNet is relatively simple to implement and its performance is fairly good across different public datasets. To begin, I have artificially synthesized a very simple toy dataset to learn object detection methods and principles.
A sample of what is inside the toy dataset is shown above. Black background with three objects, namely a circle, a rectangle and a triangle. These shapes can have various sizes. Their areas have to be greater than minimum area threshold. The rationale is that I have realized if the objects are too tiny, we need to either create more tiny anchor boxes to cover them or expand the feature pyramid levels to cover a wider range of object sizes. The threshold I picked seems to work well; the circle, rectangle and triangles are not too big and not too small on the input canvas of 224 by 224. The objects can have one of the following colors: red, green and blue. Thus, we see:
- Red circle
- Red rectangle
-
Red triangle
- Blue circle
- Blue rectangle
-
Blue triangle
- Green circle
- Green rectangle
- Green triangle
The locations, shapes, and colors are all independently and randomly generated. To draw a circle, I need to generate a random radius value. The “area” is not strictly the area of a circle, but it refers to the area of its bounding boxes. Nonetheless, such differences are negligible for this task. To draw a rectangle, I need to generate random corner coordinates. A rectangle can be a square if its width is as long as its height.
The hardest shape to draw is a triangle. The three corners of the triangle cannot be on a plane or anywhere close to it as it can make the triangle looks very flat or like a line. The corners are constrained such that one corner cannot be too far or too close from another corner. To do so, I have generated the corners one by one as I continuously check whether constrains are met. A balltree algorithm is a good way to quickly work out the distances of all nearly points inside a neighborhood of a certain radius. Another check is to make sure corners are not aligned on a single plane, so that the triangle looks like a triangle.
OpenCV have functions for drawing circle and rectangle directly, but not triangle. However, it is easy because we just need to first connect the three corners by lines and fill up the enclosed area. 5000 and 100 images are generated for training and testing respectively.
The way this toy dataset is generated has some ramification. Currently, color and shape are independent. Later, because of this independence, I have trained a shape detector and a color classifier separately. Furthermore, I have only tried one object for each shape. It can easily be extended to more objects with different shapes. I would recommend my readers to try with more objects and to observe how the detector handles them.
import os
import numpy as np
import cv2
import random
from sklearn.neighbors import BallTree
from matplotlib import pyplot as plt
MIN_AREA = 400 # the bounding box's area
def generate_corners(W, ratio=0.1):
x1 = random.sample(range(W), 1)[0]
x2 = x1 + random.sample(range(int(W*ratio)), 1)[0] + 1 # at least 1 pixel further
return x1, x2
def add_circle(img, ratio=0.1):
while True:
row, col = random.sample(range(W), 2)
color, color_label = random_color()
r = random.sample(range(int(W*ratio)), 1)[0] + 1
y1 = col - r
y2 = col + r
x1 = row - r
x2 = row + r
area = (x2 - x1) * (y2 - y1)
print (area)
if area > MIN_AREA:
break
# -1 means filled
cv2.circle(img, (row, col), r, color, -1)
return img, color_label, x1, x2, y1, y2
def random_color():
c = random.choice((0,1,2))
arr = np.zeros(3)
arr[c] = 255
# cv2 takes tuple
return tuple(arr), c
def add_rectangle(img):
H, W,_ = img.shape
while True:
x1, x2 = generate_corners(W)
y1, y2 = generate_corners(H)
area = (x2 - x1) * (y2 - y1)
if area > MIN_AREA:
break
color, color_label = random_color()
cv2.rectangle(img, (x1, y1), (x2, y2), color, -1)
return img, color_label, x1-3, x2+3, y1-3, y2+3 # not to tight
def not_being_too_close(q2, dist):
q2 = q2[dist >= np.percentile(dist, 80)] # higher than 80% of samples
return q2
def come_up_with_co(x1, y1, r, co, tree):
q2, dist = tree.query_radius([[x1, y1]], r=r, return_distance=True)
q2 = q2[0]
dist = dist[0]
coq2 = np.array([co[item] for item in q2])
# avoid points on the same plane
mask = np.squeeze(np.dstack([coq2[:,0] != x1, coq2[:,1] != y1]))
try:
mask = np.all(mask, axis=1)
except IndexError:
redo = True
else:
redo = False
q2 = q2[mask]
dist = dist[mask]
q2 = not_being_too_close(q2, dist)
if redo:
return (redo, None)
else:
return (redo, q2)
def add_triangle(img, tree, co, W):
while True:
redo = True
while redo:
x1, y1 = generate_corners(W)
ans = come_up_with_co(x1, y1, 30,co, tree)
redo = ans[0]
q2 = ans[1]
redo = True
while redo:
vertices2 = random.sample(list(q2), 1)
x2, y2 = co[vertices2[0]]
ans = come_up_with_co(x2, y2, 30, co, tree)
redo = ans[0]
q3 = ans[1]
vertices3 = random.sample(set(q3), 1)
x3, y3 = co[vertices3[0]]
#make sure the datatype is correct for cv
vertices = np.array([[x1,y1], [x2, y2], [x3, y3]], dtype=np.int32)
pts = vertices.reshape((-1,1,2))
color, color_label = random_color()
bbx1 = min([x1, x2, x3])
bby1 = min([y1,y2, y3])
bbx2 = max([x1, x2, x3])
bby2 = max([y1,y2, y3])
area = (x2 - x1) * (y2 - y1)
if area > MIN_AREA:
break
cv2.polylines(img, [pts], isClosed=True, color=color, thickness=1)
cv2.fillPoly(img, [pts], color=color)
return img, color_label, bbx1, bbx2, bby1, bby2
H,W = 224, 224
MAX_SAMPLES = 40
output_dir = .......
shape_list = []
color_list = []
color_dict = {0:'red',
1:'green',
2:'blue'}
# setup a grid
x = np.arange(W)
y = np.arange(H)
X, Y = np.meshgrid(x,y)
co = np.dstack([X.flatten(), Y.flatten()])
co = np.squeeze(co)
tree = BallTree(co)
for counter in range(MAX_SAMPLES):
img = np.zeros((H,W,3))
img, color_label, x1, x2, y1, y2 = add_circle(img)
img, color_label, x1, x2, y1, y2 = add_rectangle(img)
img, color_label, x1, x2, y1, y2 = add_triangle(img, tree, co, W)
plt.figure()
plt.imshow(img)
# plt.show()
plt.savefig(os.path.join(output_dir, str(counter) + '.png'),
transparent=True)
del img