San Francisco: Microsoft-owned OpenAI has announced its new large multimodal model “GPT-4” which accepts image and text inputs.
“We’ve created GPT-4, the latest milestone in OpenAI’s effort in scaling up deep learning,” the company said in a blogpost on Tuesday.
“We’ve spent 6 months iteratively aligning GPT-4 using lessons from our adversarial testing program as well as ChatGPT, resulting in our best-ever results on factuality, steerability, and refusing to go outside of guardrails.”
Compared to GPT-3.5, the new AI model is more reliable, creative and capable of handling complex instructions.
GPT-4 outperforms existing large language models (LLMs), including most state-of-the-art (SOTA) models which may include benchmark-specific construction or additional training methods.
“In the 24 of 26 languages tested, GPT-4 outperforms the English-language performance of GPT-3.5 and other LLMs (Chinchilla, PaLM), including for low-resource languages such as Latvian, Welsh, and Swahili,” the company said.
The company has also been using this new model internally, with great impact on functions like support, sales, content moderation and programming.
In contrast to the text-only setting, this model can accept a prompt with both text and images, allowing users to specify any vision or language task.
The GPT-4 base model, like earlier GPT models, was taught to predict the next word in a document. It was trained using both licenced and publicly available data.
ChatGPT Plus subscribers will get GPT-4 access on chat.openai.com with a usage cap, while developers can sign up for the GPT-4 API’s waitlist.
“We look forward to GPT-4 becoming a valuable tool in improving people’s lives by powering many applications,” the company said.
IANS