A project to reconstruct the 3D model of room architecture from photographs.
My Final Year Project aims to reconstruct the 3D model of room architecture from photographic images, specifically focusing on extracting the relevant structural edges in photographs. This has always been
the greatest challenge in the process because traditional deterministic approaches to the problem, such as the Canny Edge Detector, have been unable to isolate structural edges from non-structural edges, expecially in images where much noise is present. As such, machine learning is proposed as a possible alternative for detecting structural edges in photographic images.
In recent years, there have been huge improvements in image processing capabilities. In particular, Fully Convolutional Networks (FCN) have gained immense popularity in image segmentation projects. By classifying each pixel as belonging to a structural edge or not, a FCN can generate a mask that predicts the location, size and shape of structural edges. For this project, I adapted the U-Net model and implemented it using PyTorch.
Since no similar projects have previously been done, it was difficult to find relevant labelled datasets. Eventually, I had to manually label images from the LSUN dataset. In order to speed up the labelling process, I created a custom labelling tool that was designed particularly for edge-labelling purposes. After augmenting the dataset to get around 20,000 labelled images, the model is trained over 50 epochs.
With some post-processing to the output mask and applying Hough Transform, the equation of each line that represents an edge can be determined. Using this result, a regression can be conducted to create a skeleton of the 3D structure. Through this project, I have learnt the theories behind machine learning, particularly in the field of computer vision.