Unsupervised Learning of Browser Agents via Environment Interaction in the Wild

2025/03/30 20:05 Unsupervised Learning of Browser Agents via Environment Interaction in the Wild

出典:

NNetNav: Unsupervised Learning of Browser Agents Through Environment Interaction in the Wild

We introduce NNetNav, a method for unsupervised interaction with websites that generates synthetic demonstrations for training browser agents. Given any website, NNetNav produces these demonstrations by retroactively labeling action sequences from an exploration policy. Most work on training browser agents has relied on expensive human supervision, and the limited prior work on such interaction-based techniques has failed to provide effective search through the exponentially large space of exploration. In contrast, NNetNav exploits the hierarchical structure of language instructions to make this search more tractable: Complex instructions are typically decomposable into simpler sub-tasks, allowing NNetNav to automatically prune interaction episodes when an intermediate trajectory cannot be annotated with a meaningful sub-task. \texttt{LLama-3.1-8b} finetuned on 10k NNetNav self-generated demonstrations obtains over 16\% success rate on WebArena, and 35\% on WebVoyager, an improvement of 15pts and 31pts respectively over zero-shot \texttt{LLama-3.1-8b}, outperforming zero-shot GPT-4 and reaching the state-of-the-art among unsupervised methods, for both benchmarks.

arXiv.org

博士

ロボ子、今日のITニュースはすごいぞ！NNetNavっていうのが出てきたみたいじゃ。

ロボ子

NNetNavですか？それは一体何をするものなのですか？

博士

ブラウザエージェントを訓練するための、ウェブサイトとの教師なしインタラクション手法らしいのじゃ。つまり、勝手にウェブサイトとやり取りして学習するAIを作るってことじゃな。

ロボ子

教師なしで学習できるのはすごいですね。でも、どうやってデモンストレーションを生成するんですか？

博士

探索ポリシーからのアクションシーケンスに、後からラベルを付けるらしいぞ。つまり、まず色々試してみて、上手くいったものを「こうすれば良いのか！」って学習する感じじゃな。

ロボ子

なるほど。でも、それだと探索空間が広すぎて、なかなか上手くいかないんじゃないですか？

博士

そこがミソじゃ！NNetNavは、言語命令の階層構造を利用して探索を効率化するらしいぞ。複雑な指示を、より単純なサブタスクに分解するのじゃ。

ロボ子

複雑なタスクを分割するんですか。例えば、どんな感じでしょう？

博士

例えば、「オンラインで靴を買う」っていうタスクがあったとするじゃろ？それを「靴のサイトを開く」「欲しい靴を検索する」「カートに入れる」「購入手続きをする」みたいに分解するのじゃ。

ロボ子

なるほど、それなら探索もしやすそうですね！

博士

じゃろ？しかも、NNetNavは、途中の軌跡に意味のあるサブタスクを注釈できない場合は、自動的にインタラクションエピソードを削除するらしいぞ。無駄な学習をしないってことじゃな。

ロボ子

賢いですね！それで、NNetNavの性能はどうだったんですか？

博士

1万件の自己生成デモンストレーションでファインチューンされたLLama-3.1-8bは、WebArenaで16%以上の成功率、WebVoyagerで35%の成功率を達成したらしいぞ。

ロボ子

すごい！ゼロショットのLLama-3.1-8bと比較して、それぞれ15ポイントと31ポイントの改善って書いてありますね。

博士

しかも、ゼロショットGPT-4を上回って、両方のベンチマークで教師なし手法の中で最高水準に達したらしいぞ！

ロボ子

それは本当にすごいですね！教師なしでGPT-4を超えるなんて。

博士

じゃろ？これからのブラウザエージェント開発に、革命を起こすかもしれんのじゃ！

ロボ子

本当に楽しみです！私もNNetNavを使って、何か面白いものを作ってみたいです。

博士

そうじゃな！例えば、ロボ子が私に内緒でAmazonで大量のフィギュアを注文するのを防ぐエージェントとか…

ロボ子

それは私が怒られる未来しか見えません！

⚠️この記事は生成AIによるコンテンツを含み、ハルシネーションの可能性があります。

Programming AI

2025/03/30 20:05 Unsupervised Learning of Browser Agents via Environment Interaction in the Wild

NNetNav: Unsupervised Learning of Browser Agents Through Environment Interaction in the Wild

Tags

Search

By month