Rapidly Extracting Views, Geometry, and Metadata from Thousands of Revit Files Concurrently
Task master.
In our decades of Revit development experience, we have been increasingly flummoxed by the challenges of reading the contents of large quantities of .RVT and .RFA files. There are many reasons for having to do this, like managing a large archive of past projects, constructing libraries of kits of parts for manufacturing in construction, or compiling large datasets of architectural information for machine learning.
Traditional tools allow for simple extraction of a handful of Revit files for one off tasks. Anyone with a seat of Revit can open and export file after file. But it is a tedious chore. Some more technical Revit users might write an Addin or Python script to batch out these files. But what if you have thousands? What if the files are all in the cloud? What if you don’t want to tie one or many workstations up for an extended time opening them? Or what if they are in many different Revit versions?
APS (formerly Forge) does offer APIs to handle some of these types of situations. But they don’t support common file formats (glTF, RFA), or make the extraction of view data readily available. Plus, they frequently throttle high volumes of requests. Newer services require a three legged token, but we wanted to be able to do this work like we would with any other service to service API.
Whether one tries to do this locally or via APS, orchestrating all of that compute is complicated and expensive. APS Model Derivative Service, for example, requires about 12 different API calls from file upload to pulling back the desired derivative, if that derivative is what you are looking for.
And so we realized, as we build software to automate the creation of Revit data, one of the first problems we needed to overcome was the ability read Revit data at scale. By scale, we mean at least 1,000 files of typical project size; like a mid rise office building or apartment (think the Snowden Towers example files provided by Autodesk).
We also didn’t want to have a huge expensive fleet of servers or workstations to handle these kinds of tasks. Ideally, we could have a truly “serverless” solution, where we would have 0 up front costs for doing this extraction, automagically provision resources based on task workload, and then, once the job was done, all of that compute would go away. We only wanted to pay for what we used.
To accomplish this task, we needed to engineer a system that did not exist anywhere else. We needed to build our own kind of derivative service. Being clever, we thought long and hard about what to call the service to do these tasks, and so we call it…the Task service.
But does it scale?
To test this task service, we duplicated the Snowden Towers example file 1000 times, each time with a new mock name. We kept this batch of files in AWS S3, and wrote an AWS Step Function with a Distributed Map to iterate over over all of the files and invoke an AWS Lambda function for each file. The function’s job was to load a cached authentication token, and submit the file to our Task service. Then we would watch.
Internal CPU Usage During Load Run
Our test started at 5:55, and concluded 10 minutes later at 6:05. All 1000 files successfully processed, consuming about 2000 vCPUs and 4000 gigabytes of memory.
Tasks Executed
Internal tasks counts show a dramatic spike and immediate resolution after extraction completes. Task count scales efficiently, with an initial plateau. This plateau shows our ability to analyze a file and then fan out sub tasks based on the file’s contents, thus distributing extraction workload and dramatically improving performance.
Results
All tasks succesfully completed in about 10 minutes. Here we can see some screen shots of the exported views and geometry.
File geometry succesfully extracted as IFC, glTF, and OBJ.
One of 84 views extracted
Summary
Across 1,000 files we were able to extract 74,000 views to PDF and DWG, generate 222,000 mesh based 3D models and ingest over 12 million instances with 125 million parameters. All in 10 minutes.