In part one of XKCD font saga I gave some background on the XKCD handwriting dataset, and took an initial look at image segmentation...

In part one of XKCD font saga I gave some background on the XKCD handwriting dataset, and took an initial look at image segmentation in order to extract the individual strokes from the scanned image. In this installment, I will apply the technique from part 1, as well as attempting to merge together strokes to form (some of) the glyphs desired.

I’m going to pay particular attention to “dotted” glyphs, such as “i”, “j”, “;” and “?”. I will need to do future work to merge together non-dotted glyphs such as the two arrows from “≫”, as these are indistinguishable from two characters that happen to be close to one another.

If you’d like to follow along, this notebook and the handwriting file may be found at https://gist.github.com/pelson/b80e3b3ab9edbda9ac4304f742cf292b.

We start by using the technique from part one to segment the images into individual strokes.

In [1]:
import matplotlib.pyplot as plt
from skimage.color import rgb2gray
import scipy.ndimage.measurements
from skimage import measure
import numpy as np
from scipy import ndimage as ndi


handwriting_img = plt.imread('handwriting_minimal.png')
handwriting_img_gray = rgb2gray(handwriting_img)

labels, _ = ndi.label(handwriting_img_gray < 1)

stroke_locations = measure.regionprops(labels)

Using this information, we create an image array that isolates the label, and use the sub-image’s bounding box as a lookup:

In [2]:
bbox_to_stroke_img = {}

for stroke in stroke_locations: 
    # [miny, minx, maxy, maxx] (I don't know why its that way around...)
    bbox = stroke.bbox

    # Construct a slice that can be used to pick out this bounding box from
    # the full image.
    full_index = [slice(bbox[0], bbox[2]), slice(bbox[1], bbox[3])]
    
    # Pick out the sub-image, and take a copy so that we can modify it without
    # modifying the original.
    stroke_img = handwriting_img[full_index + [Ellipsis]].copy()
    
    # Using the "labels" array, produce a binary mask that is True for every
    # pixel that is marked as this label, and False otherwise. 
    stroke_mask = labels[full_index] == stroke.label

    # For each color channel, use the mask to maintain the full image pixels that
    # are part of this stroke. Where a pixel remains that is not part of this stroke,
    # replace it with 1 (ultimately making it white).
    for channel in range(3):
        stroke_img[:, :, channel] = np.where(stroke_mask, stroke_img[:, :, channel], 1)
        
    # Convert the image to an RGB byte array.
    stroke_img = (stroke_img * 255).astype(np.uint8)

    bbox_to_stroke_img[bbox] = stroke_img

Pick off potentially wide, but not high, strokes – these are our dots and lines that we may want to merge together (though not in all cases – we do want some punctuation glyphs to remain!).

In [3]:
stroke_merge_contenders = {}

for bbox, img in bbox_to_stroke_img.items():
    height = bbox[2] - bbox[0]
    width = bbox[3] - bbox[1]
    if width < 450 and height < 150:
        stroke_merge_contenders[bbox] = img

Great. Let’s do some work to take a look at these images in the notebook. Essentially, the easiest approach is to convert the images to base64 PNGs, and display them in raw HTML. I also want the images to display inline so that I can show many images at once.

In [4]:
from IPython.display import display, HTML
from io import BytesIO
import PIL
import base64


def html_float_image_array(img, downscale=5, style="display: inline; margin-right:20px"):
    ""
    Generates a base64 encoded image (scaled down) that is displayed inline.
    Great for showing multiple images in notebook output.

    ""
    im = PIL.Image.fromarray(img)
    bio = BytesIO()
    width, height = img.shape[:2]
    if downscale and (width > downscale ** 2 and height > downscale ** 2):
        im = im.resize([height // downscale, width // downscale],
                       PIL.Image.ANTIALIAS)
    im.save(bio, format='png')
    encoded_string = base64.b64encode(bio.getvalue())
    html = ('<img src="data:image/png;base64,{}" style="{}"/>'
            ''.format(encoded_string.decode('utf-8'), style))
    return html

Using this new function on our stroke_merge_contenders:

In [5]:
display(HTML(''.join(html_float_image_array(img)
                     for bbox, img in stroke_merge_contenders.items())))

 width=

Now, let’s take a look at the bounding boxes of these marks, so that we can get some intuition about what we have picked out:

In [6]:
import matplotlib.pyplot as plt
import matplotlib.patches as mpatches

plt.figure(figsize=(10, 10))
ax = plt.axes()
ax.imshow(handwriting_img)
for bbox, img in stroke_merge_contenders.items():
    height = bbox[2] - bbox[0]
    width = bbox[3] - bbox[1]
    rect = mpatches.Rectangle([bbox[1], bbox[0]], width=width, height=height,
                                facecolor='none', edgecolor='blue')
    ax.add_patch(rect)
plt.show()
  width=

This looks promising. Let’s try to merge the chosen images to other, suitably close, strokes. In all situations, we want to allow a reasonable y distance, and a small x distance (e.g. consider the strokes for a speech mark “).

First, define the function that can actually do the merging of two images for us. We will have two image arrays and two bounding boxes, and we want to come up with a single image array and bounding box.

In [7]:
def merge_images(img1, img1_bbox, img2, img2_bbox):
    ""
    A function to merge together two images with different bounding boxes.

    ""
    # The new image bounding box.
    bbox = (min([img1_bbox[0], img2_bbox[0]]),
            min([img1_bbox[1], img2_bbox[1]]),
            max([img1_bbox[2], img2_bbox[2]]),
            max([img1_bbox[3], img2_bbox[3]]))
    
    # The new image shape.
    shape = (bbox[2] - bbox[0], bbox[3] - bbox[1], 3)
    
    # The slice for image 1 inside of the new image array.
    img1_slice = [slice(img1_bbox[0] - bbox[0], img1_bbox[2] - bbox[0]),
                  slice(img1_bbox[1] - bbox[1], img1_bbox[3] - bbox[1])]
    
    # The slice for image 2 inside of the new image array.
    img2_slice = [slice(img2_bbox[0] - bbox[0], img2_bbox[2] - bbox[0]),
                  slice(img2_bbox[1] - bbox[1], img2_bbox[3] - bbox[1])]

    # Construct the new image, and fill it with white.
    merged_image = np.empty(shape, dtype=np.uint8)
    merged_image.fill(255)
    
    # Use all of image 1 and just drop it into the correct location within the new image.
    merged_image[img1_slice] = img1
    
    # We can't use the same approach for image 2, as it potentially overlaps with image 1.
    # Instead we use the parts of image 2 that aren't at the maximum of each color channel. 
    merged_image[img2_slice] = np.where(img2 != 255, img2,
                                        merged_image[img2_slice])
    
    return merged_image, bbox

This function probably also exists within skimage. Now that we have the ability to merge together images, let’s define some functions that will help us measure the distance between potential images.

In [8]:
def min_interval_distance(interval_1, interval_2):
""
Calculate the distance between two intervals.

>>> min_interval_distance([0, 1], [2, 3])
1
>>> min_interval_distance([0, 1], [0.5, 3])
0
>>> min_interval_distance([10, 11], [5, 8])
2
>>> min_interval_distance([10, 11], [8, 5])
2

There is so much room for more elegance here, but hey-ho...

""
interval_1_sorted = sorted(interval_1)
interval_2_sorted = sorted(interval_2)

min_distance = min([np.abs(i_1 - i_2) for i_1 in interval_1 for i_2 in interval_2])
within_1 = any(interval_1_sorted[0] <= i_2 <= interval_1_sorted[1] for i_2 in interval_2)
within_2 = any(interval_2_sorted[0] <= i_1 <= interval_2_sorted[1] for i_1 in interval_1)

if within_1 or within_2:
min_distance = 0
return min_distance

def max_interval_distance(interval_1, interval_2):
""
Calculate the distance between two intervals.

>>> max_interval_distance([0, 1], [2, 3])
3
>>> max_interval_distance([0, 1], [0.5, 3])
3
>>> max_interval_distance([10, 11], [5, 8])
6
>>> max_interval_distance([10, 11], [8, 5])
6

There is so much room for more elegance here, but hey-ho...

""
max_distance = max([np.abs(i_1 - i_2) for i_1 in interval_1 for i_2 in interval_2

Philip Elson

Specialties: Software (development, maintenance and support), Scientific software (Python/SciPy, Fortran, C++), HPC (benchmarking, optimisation, cluster / distributed compute, compilers, MPI), Data analysis & visualisation, Technical writing, Statistics