Rasmus Emil Albrektsen – Raalb17

Rasmus Stamm – Rasta17

Mads Obel Jensen – Madje17

Video demonstration of the app

The demonstration shows the books table view that you first encounter when opening the app. From here, you can go to the “About” view, where (once implemented) the user can find information on how to use the app, amongst other things. From the books table, the user can select a specific book that one wishes to use with the camera. The first book chosen is compatible with the book, but the second book is not. This is to show that the resources scanned for, are only the ones relative to a specific book.
From the books table, the user can also search amongst the available books.


The group was inspired by the workings of AR-apps, and wanted to find a way to work out an idea that implemented this. We finally decided to create an app that could help out people who are visual learners. 

When studying, we felt that reading for a long time could often lead to you not actually paying attention to the material in the book.
So the idea to fix this, was to play a video that explained or summarized something with connection to an image in a book. Therefore, the problem statement that we as a group set out to answer was:

How do we create an app that helps people who are visual learners, study better?

The specific idea behind the app, was for the user to be able to hover the camera above an image (maybe of a diagram) in a study book, and then see a video played over the border of said image. The video would then be an explanatory video, that supports the material in the book.

The app has resource folders, that contain the different images within a book. When a book is chosen, the app will recognize the individual images as you go along, and then play a video over the image.

There are many more implementations one could pursue as well. Beyond just playing videos, one could show a 3D illustration of a model, so that the user could interact with it.
Furthermore, you could implement audio that explains something in a book, by having targets in a book, that a camera could recognize.
This is without mentioning, that it is possible to implement the technology on children’s books, to make them come alive with animations.

Methods and Materials


We started the process by brainstorming a bit to gather some ideas of our own, after having heard what other people, from the IOS Programming course, had of ideas. We fairly quickly came about the idea of making something based in AR. We had a few different ideas within this section, but settled on an app that could show 3D content, relating to books. 

Investigating similar solutions

We searched high and low on the internet and didn’t really find any solutions quite the same as our projected solution. One competing solution we did find, had the same basic functionality but lacked the support of different books and the feature to interact with the 3D projected models.

Prototype and design

To test out our basic idea we did some basic UI design in an app called, MarvelApp, allowing users to go through the app to see if the navigation through the various views were intuitive and easy to use. The first design wasn’t quite there, so through two different sessions of changes, we made the app relatively simple and easy to use.

The design was first agreed upon, as three different views that contained the AR-camera, the settings and a view where you can find help as to how you use the app. The idea was to have the camera appear first upon opening the app, and then allowing the user to navigate using a tab in the bottom. The design can be seen in the picture below.

This design was used to create the first illustrative prototype, and was used so the group could collect feedback, towards creating the working prototype.


We had a few people test the very barebones MarvelApp version of the app, to gather feedback on the usability of the system we had a few friends and a bit of family go through the app. This resulted in us changing the layout to be more aligned with what we ended up using.

Use case diagrams

We made some use case diagrams, documenting the original base functionality for the different views in the app, not so much for development purposes. They show the basic functions of the app, in a very simple and manageable way.

The camera use case diagram is very simple, as we only had planned one function from the camera view and that function was not even manually activated. This function is called implicitly by the user, this happens when the user points the camera at something the app recognizes.

The settings view was originally planned to have quite a bit of features. These features, was primarily meant to let the user customize the apps behavior, so it was to their liking. “Ask before opening camera” was a setting meant to allow the user to gain more control over what and when the app has access to the camera, increasing the users privacy.

The books view was meant to let the user browse through the different supported books and search these too. Then the user should be able to select a book and see different information about the book. Both of these functions were to be called explicitly by the user.

MVC object diagram

The original MVC structure was very simple as we did not have the technical knowledge to plan out the structure correctly up front, therefore the structure and details are very limited. The main focus here was to show which views, controllers and models we were thinking of implementing. This however changed a bit along the process.


To order the requirements, and determine rank in terms of the importance each requirements, had for the project, the group decided to use the MoSCoW model:


  • Use the camera to recognize images.
  • Play videos over the image recognized.
  • Contain multiple books to select from.


  • Be able to search the compatible books.
  • Turn sound off in setting for the videos played. 


  • Be able to play a video in a browser.
  • Display a 3D model when recognizing images.
  • Play sound when recognizing images.
  • Contain settings as to which default browser is used.


Before turning the design ideas and illustrative prototype in to actual code, core principles were first decided on:

  • The app would follow a MVC architecture.
  • The app would use the AR-kit and Scene-kit.

Aside from this, a general idea concerning which classes would be created was also thought out. Before implementation, the idea was to create three different controller for three different views, which then used a model class “Books”, for data purposes when loading the different resources needed for the AR-camera. 

Evaluation of the early phase deliverables

The early phase designs and prototypes were evaluated using feedback from friends and family, as well as the feedback gathered during presentation of our early prototype and design in class.
The main feedback for the project, was to decide on the target users for the app, and then focusing the functionality around these users.
Furthermore, feedback was given in relation to the design. It was suggested that, we should consider what we specifically thought would be the most user friendly way to navigate through the app on startup. Our initial idea was to have the camera open first, but since we discussed the idea of being able to select the specific book, we instead began focusing on a different design that made more sense in regards of the navigation when first selecting a book.



The brainstorming process didn’t take too long, as we had already decided on working with AR in relation to books. First the group came up with two possible types of books to work on:

  • Children’s books
  • Study books

The group decided to go with study books, since focusing on children’s books would take a lot of work as to creating animations, and we feared this would take more time to learn that what was available. 

Now the only question was, what did we want to display when a picture was recognized.
We had two ideas that we chose to focus on, either we played a video over the borders of the image, or we displayed a 3D rendering of something on top of the image.
We decided to go with the videos, since we thought this in general would be more relevant to the images one could find in a study book.

Prototype and design

The working prototype, proved quite different from the initial design and prototype created in the early phases. Instead of having view focused on guiding the user through using the app, we created a view to select books.

The “Book” view, now appears first instead of the camera. From here, the user can follow two paths:

  • Press a button that will lead to the “About” view, where the user can find information on how to use the app, as well as frequently asked questions. This view replaced the intended view for settings. 
  • Select a book, which then leads the user to the camera, now focused on finding images related to the selected book, and the displaying a video on top of this image.

The finished design can be seen in the video demonstration of the app. 

Evaluation of the end design and prototype

The design was changed in accordance with the feedback from the early phase and general logical sense when thinking of how the target user should navigate through the app. When the app is opened, the user now sees a table containing the different books available for use with the app. The table can be searched through, and once a book is selected, you are navigated to the camera.
This design made more sense, when considering the fact that a user had to select the book in order to make the camera work. Furthermore, the view which was first intended for settings, is now called the “About view”, since time proved it hard to implement settings as well. This view is accessed from a button on top of the books table, since the relevance of this view did not seem important enough to be highlighted further.

Current structure of the application

The model describes the communication between objects in regards of the MVC structure, in terms of the current state of the project. This explains the basic MVC structure in terms of how the controllers communicate with their specific views, and which model classes are used from specific controllers.

Source code

This section will contain the source code for the “ARViewController”, which we use for the camera view. Since this is the core functionality of the app, this is the main focus.

As mentioned earlier, the controller uses the AR-kit, the Scene-kit, and also the UI-kit.
The controller is connected to the view which has a “ARSCNView”.
The first three functions in the controller, are focused around the functionality concerning loading the view, what happens when it appear, and what happens when it disappears. 

    override func viewDidLoad() {
        imageView.image = testBook?.cover
        self.navigationItem.title = testBook?.title
        sceneView.delegate = self;
        //bookList = BookList();
    override func viewWillAppear(_ animated: Bool) {
        let config = ARImageTrackingConfiguration();
        if let trackedImages = ARReferenceImage.referenceImages(inGroupNamed: testBook!.resource, bundle: Bundle.main){
            config.trackingImages = trackedImages;
            config.maximumNumberOfTrackedImages = 1;
        } else {
            print("cant find image");
    override func viewWillDisappear(_ animated: Bool) {

When the view appears, the “ARImageTrackingConfiguration” is instantiated, so that the camera can begin tracking the environment that it sees.
Once the configuration is set up, the tracked images are stored, one at a time, and compared to the different images found in a resource folder specifically named in accordance with a resource string, sent along with the book selected. 

At the end of the method, the session is run on the “ARSCNView”, called “sceneView”.
The disappear method, pauses this session once the user navigates to a different view.

The most important function besides the ones mentioned, is the “renderer” function. 

    func renderer(_ renderer: SCNSceneRenderer, didAdd node: SCNNode, for anchor: ARAnchor) {
        guard let imageAnchor = anchor as? ARImageAnchor, let videoURL = Bundle.main.path(forResource: (imageAnchor.name ?? "no name"), ofType: "mp4") else {
            print("no video found")
        print(imageAnchor.name ?? "no name found")
        let videoToPlay = AVPlayerItem(url: URL(fileURLWithPath: videoURL));
        let player = AVPlayer(playerItem: videoToPlay);
        let videoNode = SKVideoNode(avPlayer: player);
        NotificationCenter.default.addObserver(forName: .AVPlayerItemDidPlayToEndTime, object: player.currentItem, queue: nil) { (notification) in
            player.seek(to: CMTime.zero)
            print("Looping Video")
        let videoScene = SKScene(size: CGSize(width: 480, height: 360));
        videoNode.position = CGPoint(x: videoScene.size.width / 2, y: videoScene.size.height / 2);
        videoNode.yScale = -1.0;
        let plane = SCNPlane(width: imageAnchor.referenceImage.physicalSize.width, height: imageAnchor.referenceImage.physicalSize.height);
        plane.firstMaterial?.diffuse.contents = videoScene;
        let planeNode = SCNNode(geometry: plane);
        planeNode.eulerAngles.x = -Float.pi / 2;

This function makes sure, that when the camera discovers a image that matches one from a resource folder, the correct video is displayed on top of it.
The function works by first creating a “imageAnchor” of the type “ARImageAnchor” if this anchor matches a image in the resources. If it matches, the path for the video to display, is created. The path is created from the name of the tracked image, since the names of a image and a corresponding video are the same.
The path is made in to a URL that can be passed on to a video player, which is based on a “AVPlayer”, and then added to a “SKVideoNode”.  A scene is then created for the video player, and made in to fit the anchor size and width, so that it displays the video on top of the image tracked in accordance with the dimensions tracked. 


The project we made, successfully implemented the feature to scan and recognize a part of a book to then, show a video of something relating to the given subject in the book. This was the main goal, that the group set out to achieve, which is why the group finds this achieves the core implementation that was important for this project.

The design process throughout the project, proved that feedback and trial was important when trying to find the most logical and easy way of navigating through the app.
Changing the relevance of a view like the “About view”, helped make the app seem as simple as the group intended it to be.
The fact that a user now opens the app in to a table of books to be selected, is a great example of how the feedback from friends was implemented. It makes a lot more sense, both in terms of how the app is coded, as well as the logical navigation, that you first select the book you want to use, and then go to the camera. 


Under the idea generation phase we found a vaguely similar app, but this app was only meant for the asian market and only worked with one book. It did allow the user to playback video content “in” the book. Otherwise the market is very sparsely populated, generally the AR segment of apps is very small. So there is really not a lot of competition, however the general idea exists, and can be found in similar implementations.


The app is very barebones in the current state, it is possible to show videos “in” books, but it is not really possible to interact with the content. In the future it would be a great feature to be able to show models of different things in connection with the use of learning books, ie. if a mechanic is reading about an engine, it would be immensely useful to be able to show a model of the given engine. Additionally enabling the mechanic to interact with the model, i.e. zooming in and out, rotating and possibly even disassembling the model in AR, would be a great feature.

To summarize the results and goals of the project, the group managed to achieve the main goal, that is the core function of the app.
We sought out to answer the problem statement, which concerned helping visual learners study. Even though we haven’t been able to question people who categorize as visual learners, we’ve worked from the idea, that video content helps break from the tiresome activity of reading a lot. From this assumption, the group feels that the problem now has a solution, that is the app this project has created.
There is, however, a lot to be improved on. For further development, we would implement more possibilities to interact with the content seen in the books, as well as different types of content to show from a image in a book. The group has, as stated earlier, thought about the possibilities of adding 3D models, and sound to the app, which would be something to work on for future development.

Leave a Reply