The key is to quickly and gradually cut out the areas that don't contain people (such as the sky or empty road) and lean on deep learning at the last stages, when you want complex image recognition to confirm what you're looking at. It saves a lot of the computing power normally needed for pedestrian recognition, since you're limiting the focus to just a handful of areas instead of large chunks of the screen.
UCSD's current system can only recognize one object type at a time, so you couldn't just drop it into a car by itself. However, the team plans to have it detecting multiple object types and become far more practical. And it's not limited to vehicles, either. This could be useful in robots, security cameras and other devices that need to spot humans in a heartbeat.
This article by Jon Fingas originally ran on Engadget, the definitive guide to this connected life.
Related Video: