Generating City Models By Using Panorama Images

Some time ago i was developing methods and software to transform panorama images into 3d models. The panorama image itself should be projected onto a generated 3d mesh. This would generate a textured 3d model which could be used as key for the development of different applications. In this article i will describe my approach and show some perspectives for some applications. A slightly similar approach is done by viewfinder which enables the user to position the image into a 3D context.

Project Goals

Phase I goals:

  • Automatically creating a textured 3D city model.

Phase II goals:

  • Auto align any image that was taken within the area of the created city model.

Phase III goals:

  • Refine geometric data of the city model by using structure from motion techniques.

Project Description

By using openstreetmap - OSM data and basic computer vision edge detection of panorama images it is possible to find simple corrolating points between the panorama image and the openstreetmap data. The process is described below.

Developing a software that is capabel of transforming information gathered from panoramic images and geo location based map data into a world model. Further this world model can be used to extrapolate the position of a viewer by corolating template images from the world model view and the image that was taken from the view.


This video shows the basic principle of projecting a panorama image onto 3d surfaces. This video is just a proof of concept that you can use openstreetmap data as a valid basis for the mesh generation. In my mind the accuracy is quite good.

Mesh generation:

unmapped This is the panorama source image.

mapped lines

By using the initial camera position and orientation of the panorama image we can align the openstreet map data. The arrows in the image indicates the direction in which the facing surface will be generated.


The bottom end of the surface is defined by the openstreetmap data and the top end by the extracted contour.

Map generation: To avoid generation of surfaces that can’t be seen by the viewer we just remove all these lines in the osm map data as described here. At first we generate the normal vectors for each line to determine if the (bottom) line will be part of a facing surface.


We remove all lines that do not belong to a facing surface by comparing the view vector with the normal vector.


After removing these lines we also remove the lines which are currently not visible by the viewer. This can be done by defining a 2d view volume. All lines that aren’t visible will be removed.



Aligning any image with some parameters about its location precisely within this scene can open a variety of applications. Such as real time augmented reality applications. You can superimpose any content over an image or a video of the scene. And more specific: You could align your content to any surface the scene provides. Imagine arrows that point a direction along walls of a building.


Currently i’m developing the applications in cpp using OpenCV and OpenSceneGraph (OSG).

What i have done:

  • Projective Textures using glsl shaders. 15.07.2009</li>

  • Extracting horizon contour from images using OpenCV functions. - 19.07.2009

  • Creating a view volume that represents the actual view of the 'projector' within the program. 06.07.2009

  • Align the extracted into the view volume of the program so that the contour will be superimposed over the viewed scene. - 22.07.2009

  • Transforming osm data coordinates to <a href="">UTM.</a> - 01.08.2009

  • Combining mesh creation & osm data & projection of image data within one program. - 03.08.2009

  • Extracting openstreetmap data via xml web api by using a adapted c library. - 03.08.2009

  • Developing a fast algorithm for generating a homogeneous polygon mesh between two lines. - 09.08.2009

What needs to be done:

  • Align a unknown image which was taken within the scene using <a href="">sift template matching</a> as key technology.

  • Handling multiple projection sources within one scene

  • Useing accelerometer data from wiimote to match the viewer orientation within the scene

  • Using GPS data to match the viewer position within the scene.

  • Refining the viewer attitude (position, field of view, orientation) by examine external image feed

<script src="" type="text/javascript"> </script> <script type="text/javascript"> gliffy_did = "1785581"; embedGliffy(); </script>

Screenshots: pano1

Here you can see the generated mesh of the MQ using openstreetmap data. A projection is applied as texture to the model.


Here you can see one projection face. A full projection will consists of four projection planes. One for each direction. As you can see the projection is divided into smaller parts. This must be done to achieve maximum image resolution of the projection since the maximum texture resolution is 2048x2048.


This is the projector view.


The panorama image was taken by Jan Zarnikov. Panorama at