Size matters in real-time machine learning

When enough parameters are enough

5 min readSep 8, 2016

Ask a machine learning expert if they want more parameters to develop better predictions, and you might expect a “yes”.

NOT SO FAST.

Turns out that the law of diminishing returns applies to data models. Combine a need for real-time decisions with limited computational and graphics processing resources, and you have a case where less is more.

A real-life Richard Hendricks, except with better public speaking skills

At a recent Silicon Valley Deep Learning Group Meetup, DeepScale CEO & Founder Forrest Iandola presented the case for a lean approach to deep neural networks. Forrest earned a PhD in Electrical Engineering and Computer Science from the University of California, Berkeley.

DeepScale is betting long on neural networks. Forrest believes that neural networks will eat the world. When he founded DeepScale in 2015, he was unsure what big problem to address first. It turned out that the problem came looking for DeepScale. After fielding numerous inboard requests from autonomous driving companies, DeepScale found its target application.

Making sense of sensor data

Autonomous vehicles are equipped with sensors including LIDAR, RADAR and cameras. This sensor data is streaming real-time and the system needs to render the images, interpret the data, and understand what it all means. DeepScale has focused most of its attention on the “real-time perception” issue, which is widely acknowledged as the weak link in autonomous driving technology. An example of the real-time perception challenge is to identify pedestrians that may unexpectedly cross in the vehicles path, safety cones for road-work, traffic sign warnings of detours and road hazards, to name a few.

Did the machine guess right?

This is where the machine learning comes into play. If the machine is consistently making perception errors, an analysis and correction of the flawed decision process can be applied.

Example:

The system consistently misidentifies a motorcycle as a small automobile.

What the computer observes?

Is the object traveling at street level? Yes.
Is the object traveling with the flow of traffic? Yes.
Is the object traveling at the relative speed of traffic? Yes.

Based on these factors, the system might guess that the object is an automobile. However, this could be a dangerous assumption. The system may panic if the motorcycle is lane-spitting near the autonomous vehicle. The vehicle may see the fast-approaching object, and assume that a rear-end collision will take place unless it abruptly changes lanes.

Besides the initial questions listed above, the system needs to consider the shape of the object, and a more granular analysis of the object’s path. These questions might include:

Does the vehicle’s shape look like an truck, car, or motorcycle? It looks like a motorcycle.
Does the motorcycle appear to be lane-splitting? If yes, the likelihood this is a motorcycle increases.

Based on this enhanced profile of the motorcycle, the system can predict a possible lane-splitting scenario without an erroneous response.

Divide and conquer — massively decrease training time with cluster computing

It turns out image identification is complex. Machines need massive graphics processing capability in order to identify and subsequently train the system for proper object identification. In the below chart (Figure 1), DeepScale has come up with a cluster-computing method that does not negatively impact the accuracy rates, while significantly reducing the training time over a single GPU unit. Training using cluster-computing can reduce training time from weeks to hours.

Figure 1: clustered computing provides a massive reduction in processing time for image recognition training

Quality over quantity for parameters

Consider the game Jenga. One way to view the object of the game is to maintain the building’s height, while minimizing the number of blocks needed for structural integrity. In order to achieve this goal, you could remove seemingly extraneous blocks one at a time. If you pull the wrong block, the building collapses.

To improve results, you could rebuild the structure and try again. Over time, you would get better at determining which blocks are safe to remove.

In a similar vein, DeepScale setup an experiment to see if they could reduce parameters while maintaining the accuracy of a model. They pruned sets of parameters and measured the accuracy of the model before and after. If the model’s accuracy was unaffected, they would consider leaving the removed variables out of the model. If the accuracy took a hit, they worked to identify which of the removed variables correlated with the deterioration in accuracy. After optimizing the variables, they discovered two important trends:

After a model hit a certain number of variables the accuracy rates reached a saturation point.
They could carefully remove a high percentage of variables with a minimal loss of accuracy.

Figure 2: Limited accuracy improvement despite larger model size

The end result is they were able to build small but accurate models. These smaller models are particular valuable for real-time image recognition solutions using convolutional neural networks (CNNs).

Figure 3: Model size without degradation in accuracy

As you can see by the above chart (Figure 3), DeepScale is achieving their small model size without a need for compression.

Why smaller is better, sometimes

An image evaluation requires millions of floating point operations, making their deployment on an automobile’s on-board computers problematic. Using a small model can exponentially reduces the computational requirements for the on-board computer. A further benefit of a smaller model, especially ones that do not require compression, is they are faster and less expensive to deploy and update.

More variables can take us only so far

As better sensors, faster graphic processing units (GPUs) and better models are introduced, image accuracy rates will continue to improve. Haphazardly throwing more variables at the problem is not the answer though. In the case of autonomous vehicles, variable overload is unnecessary and detrimental. For image identification models, enhancing and identifying variables which serve as signal, while removing those that are noise, is valuable for maximizing computational processing efficiency and accelerating learning.

Even in the field of big data, ingenuity trumps brute force.