On the 23rd of July, I’ll be presenting a paper at the Digital Production Symposium (DigiPro) in Anaheim called ‘Camera Tracking in Visual Effects – An Industry Perspective of Structure from Motion’. (Download) The overall aim is to examine why, when there’s so much research being done in automatic camera tracking in the computer vision community, we in VFX spend so much time and human effort figuring out camera movement, even with access to the latest tools, equipment and research. Without wanting to give too much away here, I’d like to share some statistics from the paper that I hope will be interesting to researchers and VFX practitioners about how long various parts of the pipeline take to be completed on a typical Hollywood feature film. I’m really thankful to my co-authors and everyone at Double Negative for helping make this happen and allowing us to share this work.
If you wish to use any of these charts – stats or info in your work, please cite our paper :)
Barber, A., Cosker, D., James, O., Waine, T., Patel, R. Camera Tracking in Visual Effects – An Industry Perspective of Structure from Motion. In Digital Production Symposium 2016, Anaheim, CA. http://dx.doi.org/10.1145/2947688.2947697
For an easy to follow and detailed overview of the VFX pipeline, check out Andrew Whitehurst’s great tutorial here.
The chart above shows the average values for each pipeline stage over 6 feature film projects worked on by Double Negative from the end of 2014 until early 2016. Without repeating everything in the chart – we can clearly see that compositing is the most time-consuming task, taking nearly twice as long as the next process (effects). It might be worth noting that compositing is typically one of the final stages to be performed – therefore any issues in the earlier stages that need to be re-done and repaired will have a cumulative knock-on effect on stages further down the pipeline.
I was interested in looking at the ‘camera-tracking’ (or matchmove) part of the process, as it’s a huge topic in computer vision research. Whilst it may look like a relatively small part of the overall pipeline on this chart – it’s still a very important task that has a lot of people working on it, and without an accurate camera solve – few of the later stages of the pipeline can happen. Given the massive amount of research output in SfM, many academics and researchers are surprised to learn that solving a camera is far from a completely automatic process! I discuss this in depth in the paper – and why this might be, but one of the main reasons is the shear diversity of shots VFX houses receive. There is simply no consideration given (other than, sometimes, the addition of tracking markers on set) to how easy a scene will be to track when it comes to planning what will happen in a shot or camera movement. If movies were like the footage that is commonly used to test computer vision algorithms in the latest papers, well, I think even fewer people would be paying to see them! (Although the Sintel dataset is doing a good job at changing this) Weird, uncalibrated lenses, motion blur, occlusion, poor lighting and featureless landscapes are all common in a VFX shot. The DigiPro paper gives a couple of methods for teasing these out of a large dataset of shots – and tries to assess their individual impact on the time it takes to solve a camera’s movement for a shot.
Let’s take a look though at how much variance there is in the earlier stages of VFX production, Roto, Prep and Matchmove (RPM) for different types of shows:
Here we see that there’s a great variance in the amount of time different shows spend on tracking, rotoscoping and prep (painting out markers and rigs). This would appear to change based on the genre of a particular film. This would suggest that there is no ‘typical’ amount of time spent on these stages as part of a film project. One of the factors that we try to account for in explaining how long a shot would take to solve is the 2D point velocity of static scene points viewed from the camera. Our reasoning behind this was simple, in that faster camera movements typically lead to motion blur, point occlusion, and new track points needing to be added to the solution at keyframes. These factors should all mean that computing the camera movement from 2D images should take longer. Motion blur alone is a challenging problem for standard computer vision and optical flow techniques.
The above chart shows the result of this test over 939 shots. There’s a general trend, however it’s clear that there are a lot of outliers, and a huge variance in the amount of time taken to solve for a set of shots with similar velocities. In the paper we attempt to split these out further, by taking into account the different types of lenses and also the amount of 3D scene data that’s available to a matchmover, for example LiDAR scans or surveys of the set. We also discuss feedback we’ve received from experienced matchmovers and supervisors about the prefered best practices in the industry. One of the main conclusions we draw is that one of the most effective ways in which to solve difficult shots is to ensure that accurate 3D scene data is gathered at recording. Even simple measurements such as actors heights, and the size and locations of markers and features can be very beneficial in getting a shot solved quicker. One of the suggestions for further research areas we give is finding new methods of reliably registering meta-data to footage. In an earlier work, we propose a method of using the motion blur present in an image to align information from un-synchronised focal-length recorders.
Gathering this Data
The charts presented here and in the paper were produced using real production data from 6 recent feature-film projects. At Double Negative, shotgun is the main scheduling and resource allocation platform, and this integrates with a custom pipeline for asset tracking and publishing. The shotgun api makes it easy to query and aggregate the time taken on different tasks. There are a couple of gotcha’s that I stumbled over. Several were down to my lack of understanding on all our pipe-processes, but there are a couple of suggestions for anyone who wants to undertake a similar study at their studio:
Know the Schema This sounds obvious but is something to keep on top of, especially in a large system. I’m not aware of a schema browser for shotgun, but I managed to get a huge amount of detail out of the schema_read() and associated methods. These return JSON which I cached into a file for each entity.
Have a strong link between tasks and pipeline assets This is something that, in my opinion, would make doing this kind of analytical work much easier – and should also have a good impact on production workflow. It’s also easier said than done, as the amount of data ingested by a VFX facility, in terms of footage, metadata, data-wrangler info, VFX briefs etc etc is huge, and will naturally be inconsistently formatted and extremely time consuming to go through. We suggest that using automated computer vision methods could have some really strong gains for this point of the VFX production pipeline. Intelligently matching reference and witness camera footage to hero footage for example could be interesting.
Talk to people Often, stuff that seems to be completely illogical and just plain wrong is actually being done for a specific reason. It’s great to take time out and chat to those using tools daily to perform tasks and see what their priorities and any tricks they know are.
Hopefully this has been an interesting read – do leave a comment if there’s anything you’d like to add. And check out the paper when it’s published. I’ll be at both DIGIPRO and SIGGRAPH 2016 so please get in touch if you’d like to catch up there too.