New Delhi: Google has developed an AI-powered spam detection system that can help spot “adversarial text manipulations” like emails with special characters, emojis, typos and other characters that easily bypassed Gmail defenses.
Touted as “one of the largest defense upgrades in recent years,” the Google upgrade comes in the form of a new text classification system called RETVec (Resilient and Efficient Text Vectorizer).
“To help make text classifiers more robust and efficient, we’ve developed a novel, multilingual text vectorizer called RETVec that helps models achieve state-of-the-art classification performance and drastically reduces computational cost,” the company said.
Systems such as Gmail, YouTube and Google Play rely on text classification models to identify harmful content including phishing attacks, inappropriate comments, and scams.
These types of texts are harder for machine learning models to classify because bad actors rely on adversarial text manipulations to actively attempt to evade the classifiers.
“For example, they will use homoglyphs, invisible characters, and keyword stuffing to bypass defenses,” said the tech giant.
Due to its novel architecture, RETVec works out-of-the-box on every language and all characters without the need for text preprocessing, making it the ideal candidate for on-device, web, and large-scale text classification deployments.
“Models trained with RETVec exhibit faster inference speed due to its compact representation. Having smaller models reduces computational costs and decreases latency, which is critical for large-scale applications and on-device models,” the tech giant said.
RETVec is a novel open-source text vectorizer that allows people to build more resilient and efficient server-side and on-device text classifiers.
The Gmail spam filter uses it to help protect Gmail inboxes against malicious emails, said Google.