At Oura, we’re committed to consistently raising the bar for accuracy. Today, we begin rolling out our new sleep staging algorithm to our members. This algorithm is among the most accurate sleep staging algorithms available in a consumer wearable, achieving 79% agreement with a polysomnography (PSG) sleep lab test, the gold standard of sleep testing, for 4-stage sleep classification (wake, light, deep, and rapid eye movement (REM) sleep).
The research and work behind this algorithm took years to perfect, using advanced machine learning techniques trained on one of the largest sleep datasets ever collected.
“The new sleep staging algorithm highlights Oura’s commitment to accuracy and scientific validation as core pillars of our product, as well as our relentless pursuit to empower members on their health journey by delivering deep, personalized health insights,” says Shyamal Patel, PhD, Head of Science at Oura.
To learn more about this undertaking, we turned to members of the Oura science team: Raphael Vallat, PhD, Senior Machine Learning Data Scientist and former sleep researcher at the University of California, Berkeley; Gerald Pho, PhD, Senior Machine Learning Data Scientist; and Xi Zhang, PhD, Head of Health Sensing.
Below, these scientists share what makes the new sleep staging algorithm stand out, what members need to know, as well as the challenges and wins of the process along the way.
What’s New?
While the new sleep staging algorithm is more accurate than ever, rest assured that the old sleep staging algorithm was already one of the best in the wearable space and was independently validated for accuracy.
The new algorithm, however, raises the bar. Now, Oura Ring achieves 79% agreement when compared to gold-standard PSG lab testing. This is a remarkable achievement, considering that studies have independently found that agreement among human experts scoring a PSG study on the 4-class sleep staging is around 88% and inter-rater agreement for 5-class sleep staging is around 83%.
Additionally, across all sleep stages, the new sleep staging algorithm has higher sensitivity, accuracy, and specificity, ranging from 74% to 98%. While other studies have shown similar results for the detection of a specific sleep stage, this improved performance typically comes at the expense of the other stages (e.g. high performance in detecting deep sleep might result in a poor ability to detect REM).
That means Oura’s new sleep staging algorithm can better detect which sleep stage you’re in throughout the night. Increased sensitivity across the sleep stages not only increases the accuracy of your Sleep Score, but also your Readiness Score, giving you a better reflection of how primed your body is for the day overall.
The Oura Difference: Diversity in Data
As you sleep, Oura Ring monitors your body signals, like your heart rate, movement, and body temperature trends, to determine when you have fallen asleep and which sleep stage you’re in. This is possible because each sleep stage (awake, light sleep, REM, and deep sleep) is characterized by distinct biosignals.
To develop the new sleep staging algorithm, “we collected one of the largest sleep wearable datasets to train an algorithm to better detect these biosignals and the associated stage of sleep, in a more diverse population,” Vallat tells us. “The improvement in accuracy ultimately provides Oura members with better insights into their sleeping patterns and overall health.”
The development process has involved more than two years of extensive research. “We actively built this dataset, which contains more than 1,200 nights of sleep using PSG and Oura Ring data from sleep labs around the world,” Vallat says.
The results from their research were published in the peer-reviewed scientific journal Sensors. The paper gives an in-depth look into how the algorithm works, and it’s publicly accessible to ensure anyone who wants to understand the technical details of how our sleep algorithm works can do so — demonstrating Oura’s commitment to transparency.
“Unlike previous datasets, the one we were collecting was from a diverse population with varied sleeping patterns and backgrounds; for example, people with different skin tones, health statuses, ages, and sleep disorders.”
Having a heterogeneous dataset ensures that the algorithm has been trained on — and therefore will work well across — a wide range of individuals. At Oura, one of our values is human-first, so we are committed to making sure our algorithm performs well for everyone.
“Since the study was published in 2021, we have continued to expand our training and testing databases,” Vallat says. “With more than double the dataset, we have increased the diversity of patients to ensure high accuracy across population types.”
Typically, sleep staging algorithms are developed on a limited amount of data (<100 nights of sleep) and taken from a homogeneous population, like healthy young university students. That means that when used by a broader and more diverse population, the algorithm inevitably becomes less accurate.
About the Development Timeline
The road from research and development to the productization of scientific algorithms takes time and poses unique challenges.
“Science is time-intensive,” Pho says. “It can take years for proper testing and validation to be completed, and our new algorithm is no exception to this. We wanted to ensure that the research version of the algorithm matches the final in-product implementation, which means ensuring that it works as expected on our much larger and diverse member base.”
“Plus, even when the science is ‘ready,’ it’s built outside of the ring as a theoretical algorithm, and needs to be synchronized with the hardware and the app,” Zhang explains. “Translating the algorithm into software has been an extensive process that involves changing the ring firmware, app and cloud development, and more.”
“Because the new sleep staging algorithm is foundational and has many downstream consequences, like affecting the overall Sleep Score and Readiness Score, we had to make calibrations to this to ensure that there was good overall continuity,” Pho says. “That’s why we released a beta in November 2022. We wanted to collect millions of nights’ worth of member data using concurrent algorithms for cohesion.”
Ultimately, over the course of two years, Oura researchers and scientists from across the globe were able to significantly expand our scientific knowledge base, and the work continues as we’re constantly adding to the dataset and updating the algorithm.
What Changes Might Oura Members See?
“As we used data from a more varied population, we noticed some key changes based on age and heart rate variability (HRV),” Vallat explains.
Changes Oura members may see in their sleep metrics include the following:
- Most will see an increase in light sleep. But rest assured: while it’s called “light sleep,” it still delivers benefits for your brain and body. Learn more about light sleep.
- If you have a higher HRV, you may see a decrease in awake time and deep sleep, and an increase in REM sleep.
- People with lower HRV may see a decrease in REM sleep, as well as an increase in awake time.
RELATED: What Is the Average HRV?
Looking Forward
It’s important to acknowledge that the science of sleep evolves with time — and so does the science behind Oura. “Our goal is to deliver the most accurate metrics and insights possible with the science and technology available today,” Patel says. “As science and technology evolve, we will continue evolving with it to push the boundaries of what is possible with Oura.”
“We are proud to be a science-led company and will continue investing in research and development efforts to bring the most advanced health-sensing capabilities to Oura members,” says Patel.
About the Oura Experts
Shyamal Patel, PhD, is the Head of Science at Oura where he leads an interdisciplinary organization focused on research and development of algorithms that translate sensor data into accurate measures of health and wellbeing. Shyamal has a PhD in electrical engineering with a specialization in signal processing and applied machine learning from Northeastern University. He completed his post-doctoral research at Harvard University and lives in Boston.
Gerald Pho, PhD, is a Senior Machine Learning Data Scientist at Oura. He has a PhD in Neuroscience, and since joining Oura has contributed to the development and deployment of several algorithms including the new sleep staging algorithm, Workout Heart Rate, and Health Risk Management.
Raphael Vallat, PhD, is a Senior Machine Learning Data Scientist at Oura. He previously worked as a sleep researcher in the Center for Human Sleep Science at the University of California Berkeley (Prof. Matt Walker‘s lab). He has published extensively on the topic of sleep and human health, and his work has been featured in several major news media and podcasts.
Xi Zhang, PhD, is the Head of Health Sensing at Oura. He supports a global team of scientists to research and develop algorithms for multiple health applications. Dr. Zhang received his PhD from the University of Michigan and built the world’s first non-invasive thrombolysis robot. He worked at Fitbit first and then Apple after graduate school. At Fitbit, he led an internal startup and also delivered a few key heart health-related features, such as the 24/7 heart rate monitoring and the atrial fibrillation detection (FDA-approved). He is also serving on the editorial board of Ultrasound in Medicine and Biology journal focusing on the ML/AI applications.