Back to blog

There’s Data in the Air

May 13, 2024
Alex Wiltschko

With data shortages looming, AI needs to embrace new data modalities — and grant computers a fresh set of senses. 

In the past few years, thanks to breakthroughs in AI, our computers have gotten a great deal smarter. From Large Language Models like ChatGPT to AI Diffusion Models like Midjourney, our premier Generative AI models can solve problems that, until recently, seemed beyond the reach of computation. But to keep learning at pace, these LLMs require vast new data reservoirs. And that data is proving increasingly scarce. Our chatbots have absorbed almost all available text. Image and video modalities offer more room for growth, but copyright restrictions are proving a costly bottleneck. 

We don’t just need more data: we need entirely new data modalities. In other words, we need to grant computers a fresh set of senses. Scent is a natural new frontier for this evolution. The oldest sense known to life on earth — tangible, physical, rooted in chemistry — is a vast untapped data source. Every breath we take in carries a unique blend of molecules directly to our brain, each holding valuable information about our bodies and our environment. Our goal at Osmo is to awaken computers to this essential layer of reality. 

Training AI systems to predict or replicate tasks relies on exposure to relevant examples. For instance, LLMs can craft detective stories because they have gorged on so many of them. Image generation models can portray a wizard holding a coffee cup because they've encountered numerous instances of both. This same principle extends to scent data, which has the added benefit of being as free and plentiful as the air around us. 

Over evolutionary time, scents have evolved from various sources, resulting in an inexhaustible array of combinations. Just as you can't capture the same photograph twice, every scent signature is entirely singular. In short, we are surrounded by an almost endless trove of data. It contains information about our health, the safety of our homes and cars, the quality of the air we breathe, and even our emotions.

Scent digitization means transforming all these molecules into actionable insights. Naturally, challenges persist. Previous approaches to collecting scents have been prohibitively expensive and time-consuming, requiring a team working for months to capture a single smell. New systems are needed to streamline this process, like our Principal Odor Map, which has solved a genuine mystery: which molecules create what scents. This breakthrough offers us a new way to make scent readable as data. 

We are now working to refine the methods by which we record smell and can reproduce it. We call this process scent teleportation — recording a smell in one place, analyzing it, and reproducing it elsewhere — and we want to make it as easy as sending a photo. We are still in the early stages of this process, but every little breakthrough along the way is taking us one step closer to unlocking scent's full potential — revolutionizing AI integration in the process. 

Our computers won’t only be smarter; they will have a much more rounded sense of the world, with tangible benefits for us all.