Thursday, May 7, 2026
HomeCryptoTether launches on-device medical AI that outperforms Google's models in benchmark tests

Tether launches on-device medical AI that outperforms Google’s models in benchmark tests


Tether’s AI Research Group has released QVAC MedPsy-1.7B and MedPsy-4B, specialized text-only medical language models built to run directly on low-power devices such as smartphones and wearables.

According to the team, these models outperform some big medical AI systems, including Google’s, on various benchmarks, and perform comparably to much larger systems on medical reasoning and knowledge tasks while maintaining full local execution and privacy.

Traditional AI systems in healthcare rely on large cloud-hosted models, requiring sensitive data like patient records and diagnostic inputs to be transmitted to external servers, creating privacy and compliance risks. This architecture is increasingly under pressure as the healthcare AI sector is projected to grow from roughly $36 billion today to potentially over $500 billion by 2033.

Tether’s team says QVAC MedPsy challenges the scaling paradigm by focusing on efficiency.

The 1.7B model is smartphone-friendly. This tiny version scored 62.62 across seven standard medical benchmarks, beating Google’s MedGemma-1.5-4B-it by over 11 points despite being less than half its size, according to researchers. It also outperformed MedGemma 27B in real-world clinical tasks like HealthBench Hard.

The 4B version model hit 70.54 on the same tests, surpassing MedGemma-27B, a model nearly seven times bigger. It delivered strong performance on HealthBench, HealthBench Hard, and MedXpertQA.

These results span eight benchmark sets including MedQA, MedMCQA, MMLU Health, PubMedQA, AfriMedQA, MedXpertQA, and HealthBench, powered by staged medical training combining supervision, curated clinical reasoning data, and reinforcement learning.

“With QVAC MedPsy, our focus was improving efficiency at the model level, rather than scaling up size,” Tether CEO Paolo Ardoino commented on the release.

These models are not only smart but also very practical, as noted by researchers. They respond quickly with short but still complete answers, saving time and battery life. They’re available in easy-to-use compressed formats that fit comfortably on mobile devices without losing much quality.

Technically, the 4B model generates responses in roughly 909 tokens, compared to about 2,953 for comparable systems, a 3.2x reduction. The 1.7B model averages around 1,110 tokens versus 1,901, cutting output by 1.7x.

Both models are being released in quantized GGUF format, with compressed versions weighing approximately 1.2 GB and 2.6 GB respectively.

“That combination matters because it directly reduces compute requirements, latency, and cost. It allows the model to run locally on standard hardware instead of relying on remote infrastructure,” Ardoino added. “In healthcare, that changes the constraints entirely; you can run medical reasoning where the data already exists, inside a hospital system or on a device, without moving sensitive information through the cloud or waiting on external processing.”

The models are now available for free under an open license on Hugging Face.

Disclosure: This article was edited by Vivian Nguyen. For more information on how we create and review content, see our Editorial Policy.



Source link

RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

- Advertisment -

Most Popular

Recent Comments