2025多環(huán)境下的 LLM Agent 應(yīng)用與增強_第1頁
2025多環(huán)境下的 LLM Agent 應(yīng)用與增強_第2頁
2025多環(huán)境下的 LLM Agent 應(yīng)用與增強_第3頁
2025多環(huán)境下的 LLM Agent 應(yīng)用與增強_第4頁
2025多環(huán)境下的 LLM Agent 應(yīng)用與增強_第5頁
已閱讀5頁,還剩74頁未讀, 繼續(xù)免費閱讀

下載本文檔

版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進(jìn)行舉報或認(rèn)領(lǐng)

文檔簡介

多環(huán)境下的LLMAgents應(yīng)?與增強12025CONT?E錄

NTS01

?模型與智能體

02多模具身智能體03推理密集智能體042科學(xué)領(lǐng)域智能體LLM

and

Agents它的定義,框架,與挑戰(zhàn)3什么是Agent?Give

me

thedefinition

of

‘Agent’.4Agent的定義Give

me

thedefinition

of

‘Agent’.“An

agent

is

anything

that

can

beviewed

as

perceiving

its

environmentthrough

sensors

and

acting

upon

thatenvironment

through

actuators.——

Stuart

J.

Russell

and

Peter

Norvig”5簡??之。。。Give

me

thedefinition

of

‘Agent’.“An

agent

is

a

systemthat

can

help

compLetetasks

intelligently.”6Just

Do

ItBut

how?We

need

tools7Just

Say

It8But

how?We

need

toolsTools

assubagentsof

an

agentsystem幕后主腦We

need

toolsTools

assubagentsof

an

agentsystemBut

how?We

need

LLM9languagelanguagemachineerstandingund(NLU)generation(NLG)基于LLM的?然語?處理LLMEnvToolsPerceptionAction10Agent系統(tǒng)框架WorldModelWorldMemoryCostSensorActorBackbonePerceptInputOutputActResponseThinkInputErrorReadWriteQueryFeedbackDynamic

environment

stateStaticdataUserInputsPredict11ErrorReadWrite?些現(xiàn)存的AgentsMobile

ALOHAAI

ScientistAlpha

GeometryCradleGPT-4VGPT-3.5DeepMindSIMAGenerativeAgentsVoyagerThinkThriceM3AWith

AndroidWorldAndroidControlAgentData

InterpreterOPExM3AWith

AndroidWorldThinkThriceCognitionPerceptionActionAgents的分類GenerativeAgentsGPT-3.5Mobile

ALOHAOPExAlpha

GeometryCradleDeepMindSIMAGPT-4VDataInterpreterVoyager13LLM

Agent的?些核?挑戰(zhàn)WorldModelWorldMemoryCostSensorActorBackbonePerceptInputOutputActResponseThinkInputErrorReadWriteQueryFeedbackDynamic

environment

stateStaticdataUserInputsPredict14ErrorReadWriteHow

to

represent

and

align

multimodal

input

signals?How

to

achieve

real-time

perception

in

dynamic

settings?How

can

agents

handle

incomplete

or

noisy

data

robustly?How

to

perform

complex

tasks?How

to

deal

with

unseen

tasks?How

to

learn

and

utilize

domain

knowledge?How

to

effectively

execute

actions?How

to

search

huge

action

space?How

to

design

and

evolve

tools?研究軌跡近期成果Multimodal

Embodied

Agents我們離具身智能助?還有多遠(yuǎn)?17WhatWe

Want

v.s.

What

We

GetWhy?Case

Study:

Embodied

Instruction

Following

(EIF)/demo/Case

Study:

Embodied

Instruction

Following

(EIF)Reference:Shi

et

al.

“OPEx:A

Component-Wise

Analysis

ofLLM-Centric

Agentsin

Embodied

Instruction

Following”,

ACL

2024.OPEx:

A

Component-Wise

AnalysisReference:Shi

et

al.

“OPEx:A

Component-Wise

Analysis

ofLLM-Centric

Agentsin

Embodied

Instruction

Following”,

ACL

2024.OPEx:

A

Component-Wise

AnalysisReference:Shi

et

al.

“OPEx:A

Component-Wise

Analysis

ofLLM-Centric

Agentsin

Embodied

Instruction

Following”,

ACL

2024.OPEx:

A

Component-Wise

AnalysisReference:Shi

et

al.

“OPEx:A

Component-Wise

Analysis

ofLLM-Centric

Agentsin

Embodied

Instruction

Following”,

ACL

2024.OPEx:

A

Component-Wise

AnalysisReference:Shi

et

al.

“OPEx:A

Component-Wise

Analysis

ofLLM-Centric

Agentsin

Embodied

Instruction

Following”,

ACL

2024.OPEx:

A

Component-Wise

AnalysisReference:Shi

et

al.

“OPEx:A

Component-Wise

Analysis

ofLLM-Centric

Agentsin

Embodied

Instruction

Following”,

ACL

2024.Insights

from

OPExReference:Shi

et

al.

“OPEx:A

Component-Wise

Analysis

ofLLM-Centric

Agentsin

Embodied

Instruction

Following”,

ACL

2024.We

NeedtoStrengthen

Multimodal

Perception?檔理解(DocUnderstanding)?前?多數(shù)?檔

LLM評測中,答案 僅來?于圖中的? 字部分視覺問答(VQA)可能不需要圖 像信息即可回 答場景?字識別(Scene

Text

Recognition)可能直接通過OCR識別?字圖象,不需要?然語?知識視覺字幕恢復(fù)(VCR)圖像中被覆蓋的?本是什

么?請在不輸出解釋的情況下還原被覆蓋的?本。Reference:Zhang

et

al.

“VCR:

Visual

Caption

Restoration”,

arXiv

preprint(2024).答案取決于圖像+?字圖像+像素級?字 提示+?然語?問題優(yōu)勢:需要進(jìn)?圖像/?字圖像/?然語?三者的對?視覺字幕恢復(fù)(VCR)Reference:Zhang

et

al.

“VCR:

Visual

Caption

Restoration”,

arXiv

preprint(2024).?類在視覺補全上的復(fù)雜機制?類很擅?識別被部分遮擋的物體?類在識別時會使?不同的腦區(qū)在識別被遮擋的事物涉及?腦不同區(qū)域之間的復(fù)雜協(xié) 同,這些區(qū)域與視覺和感知控制相關(guān)Fyall,

Amber

M,

et

al.

“Dynamic

Representation

of

Partially

Occluded

Objects

in

Primate

Prefrontal

and

Visual

Cortex.”

ELife.

2017.Li,

Bao,

et

al.

“Brain

Functional

Representation

of

Highly

Occluded

Object

Recognition.”

Brain

Sciences

13

(10).

2023.VCR任務(wù)的獨特性Reference:Zhang

et

al.

“VCR:

Visual

Caption

Restoration”,

arXiv

preprint(2024).設(shè)計初衷:解決視覺問答和場景?字識別中的問題不能避免多模態(tài)識別:圖像信息對于正確回答是必需的不能通過OCR解決:被遮擋的?字僅保留像素級提示,?法通過OCR識別具有唯?答案:與Masked

LM不同,VCR中被遮擋的?字被露出的像素級提示唯?確定,可使?準(zhǔn)確度(ACC)作為評測指標(biāo)設(shè)計理念:將圖像中的?字作為第三種模態(tài)對待圖像中的?字具有和字符串類型?然語?/常規(guī)圖像不同的特征,在多模態(tài)模型中應(yīng)該單獨加以考慮設(shè)計實現(xiàn):對于圖像/?字來源及難度的?度靈活性遮擋?字選擇的靈活性:可選擇遮擋特定token/句?/n-gram/POS

Tag遮擋區(qū)域選擇的靈活性:可調(diào)節(jié)遮擋?字區(qū)域的空?框?度來控制任務(wù)難度圖像構(gòu)建?式的靈活性:可選擇在?字部分基礎(chǔ)上有/?配套圖像探究額外圖像對模型影響VCR-Wiki數(shù)據(jù)集構(gòu)建我們基于維基百科構(gòu)建了適?于VCR任務(wù)的數(shù)據(jù)集VCR-Wiki包含兩種語?:簡體中?和英?包含兩種難度:簡單(OCR?法完成的難度)和困難(遮擋?字上下僅各保留1-2個像素)包含訓(xùn)練集+驗證集+測試集,其中訓(xùn)練集可以作為多模態(tài)?模型SFT數(shù)據(jù)Reference:Zhang

et

al.

“VCR:

Visual

Caption

Restoration”,

arXiv

preprint(2024).VCR-Wiki數(shù)據(jù)集構(gòu)建數(shù)據(jù)清洗:基于wit-base數(shù)據(jù)集,去除不含zh/en的條?及部分敏感條??本處理:基于給定字體和字號,保留前5?維基百科介紹,通過spaCy選取其中不包含標(biāo)點/數(shù)字/?名地名等的5-gram作為遮擋?標(biāo);篩去所有不包含任何遮擋?標(biāo)的條?構(gòu)建?字圖像(TEI):根據(jù)難度對遮擋?標(biāo)進(jìn)?不同程度遮擋拼接:將視覺圖像(VI)與?字圖像(TEI)進(jìn)?拼接,縮放?300px寬;篩去所有超過900px?的條?Reference:Zhang

et

al.

“VCR:

Visual

Caption

Restoration”,

arXiv

preprint(2024).英?VCR樣例Reference:Zhang

et

al.

“VCR:

Visual

Caption

Restoration”,

arXiv

preprint(2024).中?VCR樣例Reference:Zhang

et

al.

“VCR:

Visual

Caption

Restoration”,

arXiv

preprint(2024).實驗觀察在VCR-Wiki上,對應(yīng)語?的熟練使?者能夠在簡單和困難難度上取得超過90%的準(zhǔn)確率,?模型距離?類?平還有極?差距?前開源模型整體弱于閉源模型,但存在有開源模型(如CogVLM2)能夠以19B參數(shù)量?幅超越相似??的開源模型及部分閉源模型 VCR任務(wù)雖然看似簡單,但其挑戰(zhàn)涉及模型分辨率壓縮(對像素級?本提示的保留),模型推理(利?上下?推斷被覆蓋?本)和常識利?(維基百科內(nèi)容應(yīng)已被各模型訓(xùn)練集覆蓋,但效果仍然不好),在該評測上的提升還有很?的路要?我們在持續(xù)更新新模型在VCR-Wiki上的效果,希望能將VCR構(gòu)建成未來的視覺-語?模型(VLM)常?評測之??結(jié):多模態(tài)與具身AgentOPEx?前LLM-centered

EmbodiedAgents瓶頸在于多模態(tài)Perception的表征與具身Action的執(zhí)?VCRVCR任務(wù)有助于提?多模態(tài)的表示學(xué)習(xí),可成為視覺-語?模型(VLM)常?評測任務(wù)之?OPExReasoning-Intensive

Agents“體?”有待提?,“腦?”?平如何?39“?”是環(huán)境的?部分DiverseComplexity!劇本殺(Jubensha)Select

scriptAssign

rolesto

playersPlayers

readtheir

scriptsCollect

cluesfor

reasoningVote

todecide

the

murdererGrop

discussionto

find

out

murderer。。。。。。。。。。。。(a)

Game

flow

of

Jubensha

detective

gamePlease

introduce

yourrole

first,

and

thendescribe

what

you

knowabout

Mrs.

Yang,

animportant

person

in

thecase,

and

…Consultant

Wenqi,

didyou

notice

the

I

am

Wenqi

Consultant.emergency

on

the

ship

As

Mrs.

Yang'son

the

night

of

the

personal

consultant,

Iincident?

What

was

.

On

the

luxuryyour

reaction?

cruise

ship

Word

of

theSea,

she

often

…Hello,

Consultant Hello,

ConsultantWenqi.

According

to Wenqi.Fromyouryour

account,

you description,

it

seemsmainly

spent

time

on that

you

have

a

verythe

ship

with

Mrs

… close

relationship

…(b)

Example

of

group

discussionReference:

Wu

et

al.

"Deciphering

Digital

Detectives:

Understanding

LLMBehaviors

and

Capabilities

in

Multi-Agent

Mystery

Games"

ACL

Findings

(2024).部分信息和不完全信息的處理??扮演和情境適應(yīng)性?程記憶和信息管理協(xié)作與競爭的平衡多模態(tài)信息處理語?理解和?成復(fù)雜的推理和決策多??互動?標(biāo):減少Hallucination,增強ReasoningThinkThrice:三思?后?Step

1.

Memory

Retrieval:generateinitial

answerStep

2.

Self-Refinement:increaseanswerdetailsStep

3.Self-Verification:reduce

hallucinationReference:

Wu

et

al.

"Deciphering

Digital

Detectives:

Understanding

LLMBehaviors

and

Capabilities

in

Multi-Agent

Mystery

Games”

ACL

Findings

(2024).數(shù)據(jù)集 We

collected

1,115

Jubensha

gameinstances.Each

game

can

have

1

to

20

players game

token

count

can

reach

up

to518,000,

enabling

further

research

onsocially

intelligent

AI.數(shù)據(jù)申請與開源代碼數(shù)據(jù)集實驗設(shè)定信息獲取能?復(fù)雜推理能?實驗結(jié)果與觀察收集到的相關(guān)信息越多,代理通過推理解決 問題的能?就越強。在給定相同信息的情況下,LLM的固有推理 能?決定了代理解決問題的表現(xiàn)。“三思”顯著提升推理能?MR:

Memory

RetrievalSR:

Self-RefinementSV:

Self-Verification?結(jié):復(fù)雜推理AgentJubensha劇本殺游戲的諸多特點使得其成為?個很好的評測任務(wù),可以衡量Agent的多種能?ThinkThrice“三思”通過記憶檢索,細(xì)節(jié)增強,反思修改來驗證和加強推理結(jié)果,從?提?復(fù)雜推理的準(zhǔn)確性,減少幻覺現(xiàn)象的影響。Scientific

Agents另?類復(fù)雜性:Knowledge如何讓Agent獲取專業(yè)知識,解決復(fù)雜任務(wù)?Key

1:學(xué)習(xí)專業(yè)知識HoneyBeeHoneyBee

is

the

first

billion-parameter

scale

languagemodel

that

is

specialized

inmaterials

scienceReference:

Song

et

al.

“HoneyBee:

Progressive

Instruction

Finetuning

of

LargeLanguage

Models

for

Materials

Science”,

accepted

by

Findings

of

EMNLP

2023.MatSci-Instruct

and

HoneyBee

Training

WorkflowH&eyBeeInstructorVerifierEvaluatorTask/topic<Instruction><Input><Output>Sample

arXiv“Evaluate

this

example

by

…”AccuracyReasonablenessCompletenessRelevance“Evaluate

the

model

by

…”AccuracyCompletenessReasonablenessModel

capability

evaluation:Accuracy

of

output

…Reasonableness

of

output

…Completeness

of

output

…-

…Reference:

Song

et

al.

“HoneyBee:

Progressive

Instruction

Finetuning

of

LargeLanguage

Models

for

Materials

Science”,

accepted

by

Findings

of

EMNLP

2023.MatSci-InstructProgressive

Instruction

Fine-tuningExperimentalResultsInstruction

data

generated

by

our generator

improves

model

performanceThe

verifier

part

further

improves

data qualityHoneyBee

improves

progressivelyReference:

Song

et

al.

“HoneyBee:

Progressive

Instruction

Finetuning

of

LargeLanguage

Models

for

Materials

Science”,

accepted

by

Findings

of

EMNLP

2023.ExperimentalResultsReference:

Song

et

al.

“HoneyBee:

Progressive

Instruction

Finetuning

of

LargeLanguage

Models

for

Materials

Science”,

accepted

by

Findings

of

EMNLP

2023.We

evaluate

HoneyBee

on

MatSci- Instruct

benchmarkLow-resource

setting:

it

out-performs BERT-family

modelsZero-Shot

setting:

it

out-performs LLaMA-family

modelsKey

2:善?外部?具HoneyComb?具構(gòu)造和使?實驗效果所有基于HoneyComb的 模型在MaScQA和

SciQA上的準(zhǔn)確率均有顯著 提升。總體趨勢表明,

HoneyComb顯著提?了模 型性能。LLaMA-3和

HoneyBee有明顯提升。Key

3:化繁為簡,統(tǒng)籌思考Data

Interpreter:智能體也能玩轉(zhuǎn)數(shù)據(jù)Data

Interpreter:智能體也能玩轉(zhuǎn)數(shù)據(jù)數(shù)據(jù)分析可視化異常數(shù)據(jù)處理?動化特征?程?動化特征?程模型訓(xùn)練及驗證任務(wù)規(guī)劃不可控-任務(wù)規(guī)劃(DAG)問題:LLM在數(shù)據(jù)科學(xué)任務(wù)中,? 對實時數(shù)據(jù)變化、復(fù)雜依賴關(guān)系和 邏輯錯誤檢測的需求,現(xiàn)有計劃? 成?法難以滿?分析:靜態(tài)規(guī)劃不適合動態(tài)數(shù)據(jù)依 賴場景,且邏輯嚴(yán)謹(jǐn)性難以保障, 需靈活適應(yīng)任務(wù)變化及?效管理依 賴關(guān)系解法:采?動態(tài)層次規(guī)劃,構(gòu)建任 務(wù)?動圖以適應(yīng)數(shù)據(jù)變化,動態(tài)更 新任務(wù)狀態(tài)和代碼,結(jié)合?我調(diào)試 和??編輯策略糾正邏輯錯誤,通 過信?評分和驗證提升推理準(zhǔn)確性Hierarchical

Graph:化繁為簡,層次管理分層結(jié)構(gòu):(a)?個有組織的任務(wù)和動作圖,展示了?層級機器學(xué)習(xí)項?的?作流程,包括實現(xiàn)項??標(biāo)所需的任務(wù)依賴和動作序列。(b)任務(wù)的有向?環(huán)圖(DAG),以機器操作狀態(tài)預(yù)測問題為例。任務(wù)圖展示了拆解的計劃任務(wù),?動作圖(也稱為執(zhí)?圖)則根據(jù)計劃的任務(wù)圖執(zhí)?各個節(jié)點。每個節(jié)點的執(zhí)?代碼由LLM轉(zhuǎn)換代碼?成不可控-?具集成與?成問題:復(fù)雜數(shù)據(jù)科學(xué)任務(wù)中,?? 編寫?具不易,需有效整合與?動?成?具分析:直接通過LLM?成復(fù)雜?具 代碼有挑戰(zhàn),且現(xiàn)有?具推薦和組 織?式效率低解法:DataInterpreter動態(tài)推薦和 組織?具,利?LLM理解?具功能 并嵌?上下?調(diào)整參數(shù),整合多種?具并?動化?成代碼?段,通過執(zhí)?反饋迭代優(yōu)化?具庫,強化邏輯?致性和代碼效率代碼?成不可控-?具集成與?成問題:復(fù)雜數(shù)據(jù)科學(xué)任務(wù)中,?? 編寫?具不易,需有效整合與?動?成?具分析:直接通過LLM?成復(fù)雜?具 代碼有挑戰(zhàn),且現(xiàn)有?具推薦和組 織?式效率低解法:DataInterpreter動態(tài)推薦和 組織?具,利?LLM理解?具功能 并嵌?上下?調(diào)整參數(shù),整合多種?具并?動化?成代碼?段,通過執(zhí)?反饋迭代優(yōu)化?具庫,強化邏輯?致性和代碼效率Tools

of

Data

Interpreter代碼?成不可控-?動驗證問題:即便代碼?誤,也可能存在 邏輯缺陷,需有效驗證代碼輸出的 正確性分析:單純依賴異常檢測不能確保 任務(wù)完成,需對邏輯嚴(yán)密性進(jìn)?深?驗證解法:Data

Interpreter采?ACV (Automated

Confidence-based Verification)技術(shù),?成驗證代碼 模擬任務(wù)邏輯過程,依據(jù)執(zhí)?結(jié)果 計算置信分?jǐn)?shù),據(jù)此選擇最可靠答 案,有效避免邏輯錯誤,提?推理 準(zhǔn)確性以MATH內(nèi)的?個任務(wù)說明基于置信度?動驗證流程:虛線框內(nèi)是?動驗證的過程,虛線框下?根據(jù)驗證對多個候選答案進(jìn)?排序代碼?成不可控-?動驗證問題:即便代碼?誤,也可能存在 邏輯缺陷,需有效驗證代碼輸出的 正確性分析:單純依賴異常檢測不能確保 任務(wù)完成,需對邏輯嚴(yán)密性進(jìn)?深?驗證解法:DataInterpreter采?ACV技 術(shù),?成驗證代碼模擬任務(wù)邏輯過 程,依據(jù)執(zhí)?結(jié)果計算置信分?jǐn)?shù), 據(jù)此選擇最可靠答案,有效避免邏 輯錯誤,提?推理準(zhǔn)確性數(shù)學(xué)任務(wù)效果對?代碼?成不可控–經(jīng)驗池&?動De

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負(fù)責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

最新文檔

評論

0/150

提交評論