Source link : https://tech365.info/ai2-releases-molmoweb-an-open-weight-visible-internet-agent-with-30k-human-process-trajectories-and-a-full-coaching-stack/

Engineers constructing browser brokers immediately face a selection between closed APIs they can’t examine and open-weight frameworks with no skilled mannequin beneath them. Ai2 is now providing a 3rd possibility.

The Seattle-based nonprofit behind the open-source OLMo language fashions and Molmo vision-language household immediately is releasing MolmoWeb, an open-weight visible internet agent obtainable in 4 billion and eight billion parameter sizes.

Till now, no open-weight visible internet agent shipped with the coaching information and pipeline wanted to audit or reproduce it. MolmoWeb does.

MolmoWebMix, the accompanying dataset, consists of 30,000 human process trajectories throughout greater than 1,100 web sites, 590,000 particular person subtask demonstrations and a pair of.2 million screenshot question-answer pairs — which Ai2 describes as the biggest publicly launched assortment of human web-task execution ever assembled.

“Can you go from just passively understanding images, describing them and captioning them, to actually making them take action in some environment?” Tanmay Gupta, senior analysis scientist at Ai2, instructed VentureBeat. “That is exactly what MolmoWeb is.”

The way it works: It sees what you see

MolmoWeb operates fully from browser screenshots. It doesn’t parse HTML or depend on accessibility tree representations of a web page. At every step it receives a process instruction, the present screenshot, a textual content log of earlier actions…

—-

Author : tech365

Publish date : 2026-03-24 21:23:00

Copyright for syndicated content belongs to the linked Source.

—-

12345678