版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進(jìn)行舉報或認(rèn)領(lǐng)
文檔簡介
多環(huán)境下的LLMAgents應(yīng)?與增強12025CONT?E錄
NTS01
?模型與智能體
02多模具身智能體03推理密集智能體042科學(xué)領(lǐng)域智能體LLM
and
Agents它的定義,框架,與挑戰(zhàn)3什么是Agent?Give
me
thedefinition
of
‘Agent’.4Agent的定義Give
me
thedefinition
of
‘Agent’.“An
agent
is
anything
that
can
beviewed
as
perceiving
its
environmentthrough
sensors
and
acting
upon
thatenvironment
through
actuators.——
Stuart
J.
Russell
and
Peter
Norvig”5簡??之。。。Give
me
thedefinition
of
‘Agent’.“An
agent
is
a
systemthat
can
help
compLetetasks
intelligently.”6Just
Do
ItBut
how?We
need
tools7Just
Say
It8But
how?We
need
toolsTools
assubagentsof
an
agentsystem幕后主腦We
need
toolsTools
assubagentsof
an
agentsystemBut
how?We
need
LLM9languagelanguagemachineerstandingund(NLU)generation(NLG)基于LLM的?然語?處理LLMEnvToolsPerceptionAction10Agent系統(tǒng)框架WorldModelWorldMemoryCostSensorActorBackbonePerceptInputOutputActResponseThinkInputErrorReadWriteQueryFeedbackDynamic
environment
stateStaticdataUserInputsPredict11ErrorReadWrite?些現(xiàn)存的AgentsMobile
ALOHAAI
ScientistAlpha
GeometryCradleGPT-4VGPT-3.5DeepMindSIMAGenerativeAgentsVoyagerThinkThriceM3AWith
AndroidWorldAndroidControlAgentData
InterpreterOPExM3AWith
AndroidWorldThinkThriceCognitionPerceptionActionAgents的分類GenerativeAgentsGPT-3.5Mobile
ALOHAOPExAlpha
GeometryCradleDeepMindSIMAGPT-4VDataInterpreterVoyager13LLM
Agent的?些核?挑戰(zhàn)WorldModelWorldMemoryCostSensorActorBackbonePerceptInputOutputActResponseThinkInputErrorReadWriteQueryFeedbackDynamic
environment
stateStaticdataUserInputsPredict14ErrorReadWriteHow
to
represent
and
align
multimodal
input
signals?How
to
achieve
real-time
perception
in
dynamic
settings?How
can
agents
handle
incomplete
or
noisy
data
robustly?How
to
perform
complex
tasks?How
to
deal
with
unseen
tasks?How
to
learn
and
utilize
domain
knowledge?How
to
effectively
execute
actions?How
to
search
huge
action
space?How
to
design
and
evolve
tools?研究軌跡近期成果Multimodal
Embodied
Agents我們離具身智能助?還有多遠(yuǎn)?17WhatWe
Want
v.s.
What
We
GetWhy?Case
Study:
Embodied
Instruction
Following
(EIF)/demo/Case
Study:
Embodied
Instruction
Following
(EIF)Reference:Shi
et
al.
“OPEx:A
Component-Wise
Analysis
ofLLM-Centric
Agentsin
Embodied
Instruction
Following”,
ACL
2024.OPEx:
A
Component-Wise
AnalysisReference:Shi
et
al.
“OPEx:A
Component-Wise
Analysis
ofLLM-Centric
Agentsin
Embodied
Instruction
Following”,
ACL
2024.OPEx:
A
Component-Wise
AnalysisReference:Shi
et
al.
“OPEx:A
Component-Wise
Analysis
ofLLM-Centric
Agentsin
Embodied
Instruction
Following”,
ACL
2024.OPEx:
A
Component-Wise
AnalysisReference:Shi
et
al.
“OPEx:A
Component-Wise
Analysis
ofLLM-Centric
Agentsin
Embodied
Instruction
Following”,
ACL
2024.OPEx:
A
Component-Wise
AnalysisReference:Shi
et
al.
“OPEx:A
Component-Wise
Analysis
ofLLM-Centric
Agentsin
Embodied
Instruction
Following”,
ACL
2024.OPEx:
A
Component-Wise
AnalysisReference:Shi
et
al.
“OPEx:A
Component-Wise
Analysis
ofLLM-Centric
Agentsin
Embodied
Instruction
Following”,
ACL
2024.Insights
from
OPExReference:Shi
et
al.
“OPEx:A
Component-Wise
Analysis
ofLLM-Centric
Agentsin
Embodied
Instruction
Following”,
ACL
2024.We
NeedtoStrengthen
Multimodal
Perception?檔理解(DocUnderstanding)?前?多數(shù)?檔
LLM評測中,答案 僅來?于圖中的? 字部分視覺問答(VQA)可能不需要圖 像信息即可回 答場景?字識別(Scene
Text
Recognition)可能直接通過OCR識別?字圖象,不需要?然語?知識視覺字幕恢復(fù)(VCR)圖像中被覆蓋的?本是什
么?請在不輸出解釋的情況下還原被覆蓋的?本。Reference:Zhang
et
al.
“VCR:
Visual
Caption
Restoration”,
arXiv
preprint(2024).答案取決于圖像+?字圖像+像素級?字 提示+?然語?問題優(yōu)勢:需要進(jìn)?圖像/?字圖像/?然語?三者的對?視覺字幕恢復(fù)(VCR)Reference:Zhang
et
al.
“VCR:
Visual
Caption
Restoration”,
arXiv
preprint(2024).?類在視覺補全上的復(fù)雜機制?類很擅?識別被部分遮擋的物體?類在識別時會使?不同的腦區(qū)在識別被遮擋的事物涉及?腦不同區(qū)域之間的復(fù)雜協(xié) 同,這些區(qū)域與視覺和感知控制相關(guān)Fyall,
Amber
M,
et
al.
“Dynamic
Representation
of
Partially
Occluded
Objects
in
Primate
Prefrontal
and
Visual
Cortex.”
ELife.
2017.Li,
Bao,
et
al.
“Brain
Functional
Representation
of
Highly
Occluded
Object
Recognition.”
Brain
Sciences
13
(10).
2023.VCR任務(wù)的獨特性Reference:Zhang
et
al.
“VCR:
Visual
Caption
Restoration”,
arXiv
preprint(2024).設(shè)計初衷:解決視覺問答和場景?字識別中的問題不能避免多模態(tài)識別:圖像信息對于正確回答是必需的不能通過OCR解決:被遮擋的?字僅保留像素級提示,?法通過OCR識別具有唯?答案:與Masked
LM不同,VCR中被遮擋的?字被露出的像素級提示唯?確定,可使?準(zhǔn)確度(ACC)作為評測指標(biāo)設(shè)計理念:將圖像中的?字作為第三種模態(tài)對待圖像中的?字具有和字符串類型?然語?/常規(guī)圖像不同的特征,在多模態(tài)模型中應(yīng)該單獨加以考慮設(shè)計實現(xiàn):對于圖像/?字來源及難度的?度靈活性遮擋?字選擇的靈活性:可選擇遮擋特定token/句?/n-gram/POS
Tag遮擋區(qū)域選擇的靈活性:可調(diào)節(jié)遮擋?字區(qū)域的空?框?度來控制任務(wù)難度圖像構(gòu)建?式的靈活性:可選擇在?字部分基礎(chǔ)上有/?配套圖像探究額外圖像對模型影響VCR-Wiki數(shù)據(jù)集構(gòu)建我們基于維基百科構(gòu)建了適?于VCR任務(wù)的數(shù)據(jù)集VCR-Wiki包含兩種語?:簡體中?和英?包含兩種難度:簡單(OCR?法完成的難度)和困難(遮擋?字上下僅各保留1-2個像素)包含訓(xùn)練集+驗證集+測試集,其中訓(xùn)練集可以作為多模態(tài)?模型SFT數(shù)據(jù)Reference:Zhang
et
al.
“VCR:
Visual
Caption
Restoration”,
arXiv
preprint(2024).VCR-Wiki數(shù)據(jù)集構(gòu)建數(shù)據(jù)清洗:基于wit-base數(shù)據(jù)集,去除不含zh/en的條?及部分敏感條??本處理:基于給定字體和字號,保留前5?維基百科介紹,通過spaCy選取其中不包含標(biāo)點/數(shù)字/?名地名等的5-gram作為遮擋?標(biāo);篩去所有不包含任何遮擋?標(biāo)的條?構(gòu)建?字圖像(TEI):根據(jù)難度對遮擋?標(biāo)進(jìn)?不同程度遮擋拼接:將視覺圖像(VI)與?字圖像(TEI)進(jìn)?拼接,縮放?300px寬;篩去所有超過900px?的條?Reference:Zhang
et
al.
“VCR:
Visual
Caption
Restoration”,
arXiv
preprint(2024).英?VCR樣例Reference:Zhang
et
al.
“VCR:
Visual
Caption
Restoration”,
arXiv
preprint(2024).中?VCR樣例Reference:Zhang
et
al.
“VCR:
Visual
Caption
Restoration”,
arXiv
preprint(2024).實驗觀察在VCR-Wiki上,對應(yīng)語?的熟練使?者能夠在簡單和困難難度上取得超過90%的準(zhǔn)確率,?模型距離?類?平還有極?差距?前開源模型整體弱于閉源模型,但存在有開源模型(如CogVLM2)能夠以19B參數(shù)量?幅超越相似??的開源模型及部分閉源模型 VCR任務(wù)雖然看似簡單,但其挑戰(zhàn)涉及模型分辨率壓縮(對像素級?本提示的保留),模型推理(利?上下?推斷被覆蓋?本)和常識利?(維基百科內(nèi)容應(yīng)已被各模型訓(xùn)練集覆蓋,但效果仍然不好),在該評測上的提升還有很?的路要?我們在持續(xù)更新新模型在VCR-Wiki上的效果,希望能將VCR構(gòu)建成未來的視覺-語?模型(VLM)常?評測之??結(jié):多模態(tài)與具身AgentOPEx?前LLM-centered
EmbodiedAgents瓶頸在于多模態(tài)Perception的表征與具身Action的執(zhí)?VCRVCR任務(wù)有助于提?多模態(tài)的表示學(xué)習(xí),可成為視覺-語?模型(VLM)常?評測任務(wù)之?OPExReasoning-Intensive
Agents“體?”有待提?,“腦?”?平如何?39“?”是環(huán)境的?部分DiverseComplexity!劇本殺(Jubensha)Select
scriptAssign
rolesto
playersPlayers
readtheir
scriptsCollect
cluesfor
reasoningVote
todecide
the
murdererGrop
discussionto
find
out
murderer。。。。。。。。。。。。(a)
Game
flow
of
Jubensha
detective
gamePlease
introduce
yourrole
first,
and
thendescribe
what
you
knowabout
Mrs.
Yang,
animportant
person
in
thecase,
and
…Consultant
Wenqi,
didyou
notice
the
I
am
Wenqi
Consultant.emergency
on
the
ship
As
Mrs.
Yang'son
the
night
of
the
personal
consultant,
Iincident?
What
was
…
.
On
the
luxuryyour
reaction?
cruise
ship
Word
of
theSea,
she
often
…Hello,
Consultant Hello,
ConsultantWenqi.
According
to Wenqi.Fromyouryour
account,
you description,
it
seemsmainly
spent
time
on that
you
have
a
verythe
ship
with
Mrs
… close
relationship
…(b)
Example
of
group
discussionReference:
Wu
et
al.
"Deciphering
Digital
Detectives:
Understanding
LLMBehaviors
and
Capabilities
in
Multi-Agent
Mystery
Games"
ACL
Findings
(2024).部分信息和不完全信息的處理??扮演和情境適應(yīng)性?程記憶和信息管理協(xié)作與競爭的平衡多模態(tài)信息處理語?理解和?成復(fù)雜的推理和決策多??互動?標(biāo):減少Hallucination,增強ReasoningThinkThrice:三思?后?Step
1.
Memory
Retrieval:generateinitial
answerStep
2.
Self-Refinement:increaseanswerdetailsStep
3.Self-Verification:reduce
hallucinationReference:
Wu
et
al.
"Deciphering
Digital
Detectives:
Understanding
LLMBehaviors
and
Capabilities
in
Multi-Agent
Mystery
Games”
ACL
Findings
(2024).數(shù)據(jù)集 We
collected
1,115
Jubensha
gameinstances.Each
game
can
have
1
to
20
players game
token
count
can
reach
up
to518,000,
enabling
further
research
onsocially
intelligent
AI.數(shù)據(jù)申請與開源代碼數(shù)據(jù)集實驗設(shè)定信息獲取能?復(fù)雜推理能?實驗結(jié)果與觀察收集到的相關(guān)信息越多,代理通過推理解決 問題的能?就越強。在給定相同信息的情況下,LLM的固有推理 能?決定了代理解決問題的表現(xiàn)。“三思”顯著提升推理能?MR:
Memory
RetrievalSR:
Self-RefinementSV:
Self-Verification?結(jié):復(fù)雜推理AgentJubensha劇本殺游戲的諸多特點使得其成為?個很好的評測任務(wù),可以衡量Agent的多種能?ThinkThrice“三思”通過記憶檢索,細(xì)節(jié)增強,反思修改來驗證和加強推理結(jié)果,從?提?復(fù)雜推理的準(zhǔn)確性,減少幻覺現(xiàn)象的影響。Scientific
Agents另?類復(fù)雜性:Knowledge如何讓Agent獲取專業(yè)知識,解決復(fù)雜任務(wù)?Key
1:學(xué)習(xí)專業(yè)知識HoneyBeeHoneyBee
is
the
first
billion-parameter
scale
languagemodel
that
is
specialized
inmaterials
scienceReference:
Song
et
al.
“HoneyBee:
Progressive
Instruction
Finetuning
of
LargeLanguage
Models
for
Materials
Science”,
accepted
by
Findings
of
EMNLP
2023.MatSci-Instruct
and
HoneyBee
Training
WorkflowH&eyBeeInstructorVerifierEvaluatorTask/topic<Instruction><Input><Output>Sample
arXiv“Evaluate
this
example
by
…”AccuracyReasonablenessCompletenessRelevance“Evaluate
the
model
by
…”AccuracyCompletenessReasonablenessModel
capability
evaluation:Accuracy
of
output
…Reasonableness
of
output
…Completeness
of
output
…-
…Reference:
Song
et
al.
“HoneyBee:
Progressive
Instruction
Finetuning
of
LargeLanguage
Models
for
Materials
Science”,
accepted
by
Findings
of
EMNLP
2023.MatSci-InstructProgressive
Instruction
Fine-tuningExperimentalResultsInstruction
data
generated
by
our generator
improves
model
performanceThe
verifier
part
further
improves
data qualityHoneyBee
improves
progressivelyReference:
Song
et
al.
“HoneyBee:
Progressive
Instruction
Finetuning
of
LargeLanguage
Models
for
Materials
Science”,
accepted
by
Findings
of
EMNLP
2023.ExperimentalResultsReference:
Song
et
al.
“HoneyBee:
Progressive
Instruction
Finetuning
of
LargeLanguage
Models
for
Materials
Science”,
accepted
by
Findings
of
EMNLP
2023.We
evaluate
HoneyBee
on
MatSci- Instruct
benchmarkLow-resource
setting:
it
out-performs BERT-family
modelsZero-Shot
setting:
it
out-performs LLaMA-family
modelsKey
2:善?外部?具HoneyComb?具構(gòu)造和使?實驗效果所有基于HoneyComb的 模型在MaScQA和
SciQA上的準(zhǔn)確率均有顯著 提升。總體趨勢表明,
HoneyComb顯著提?了模 型性能。LLaMA-3和
HoneyBee有明顯提升。Key
3:化繁為簡,統(tǒng)籌思考Data
Interpreter:智能體也能玩轉(zhuǎn)數(shù)據(jù)Data
Interpreter:智能體也能玩轉(zhuǎn)數(shù)據(jù)數(shù)據(jù)分析可視化異常數(shù)據(jù)處理?動化特征?程?動化特征?程模型訓(xùn)練及驗證任務(wù)規(guī)劃不可控-任務(wù)規(guī)劃(DAG)問題:LLM在數(shù)據(jù)科學(xué)任務(wù)中,? 對實時數(shù)據(jù)變化、復(fù)雜依賴關(guān)系和 邏輯錯誤檢測的需求,現(xiàn)有計劃? 成?法難以滿?分析:靜態(tài)規(guī)劃不適合動態(tài)數(shù)據(jù)依 賴場景,且邏輯嚴(yán)謹(jǐn)性難以保障, 需靈活適應(yīng)任務(wù)變化及?效管理依 賴關(guān)系解法:采?動態(tài)層次規(guī)劃,構(gòu)建任 務(wù)?動圖以適應(yīng)數(shù)據(jù)變化,動態(tài)更 新任務(wù)狀態(tài)和代碼,結(jié)合?我調(diào)試 和??編輯策略糾正邏輯錯誤,通 過信?評分和驗證提升推理準(zhǔn)確性Hierarchical
Graph:化繁為簡,層次管理分層結(jié)構(gòu):(a)?個有組織的任務(wù)和動作圖,展示了?層級機器學(xué)習(xí)項?的?作流程,包括實現(xiàn)項??標(biāo)所需的任務(wù)依賴和動作序列。(b)任務(wù)的有向?環(huán)圖(DAG),以機器操作狀態(tài)預(yù)測問題為例。任務(wù)圖展示了拆解的計劃任務(wù),?動作圖(也稱為執(zhí)?圖)則根據(jù)計劃的任務(wù)圖執(zhí)?各個節(jié)點。每個節(jié)點的執(zhí)?代碼由LLM轉(zhuǎn)換代碼?成不可控-?具集成與?成問題:復(fù)雜數(shù)據(jù)科學(xué)任務(wù)中,?? 編寫?具不易,需有效整合與?動?成?具分析:直接通過LLM?成復(fù)雜?具 代碼有挑戰(zhàn),且現(xiàn)有?具推薦和組 織?式效率低解法:DataInterpreter動態(tài)推薦和 組織?具,利?LLM理解?具功能 并嵌?上下?調(diào)整參數(shù),整合多種?具并?動化?成代碼?段,通過執(zhí)?反饋迭代優(yōu)化?具庫,強化邏輯?致性和代碼效率代碼?成不可控-?具集成與?成問題:復(fù)雜數(shù)據(jù)科學(xué)任務(wù)中,?? 編寫?具不易,需有效整合與?動?成?具分析:直接通過LLM?成復(fù)雜?具 代碼有挑戰(zhàn),且現(xiàn)有?具推薦和組 織?式效率低解法:DataInterpreter動態(tài)推薦和 組織?具,利?LLM理解?具功能 并嵌?上下?調(diào)整參數(shù),整合多種?具并?動化?成代碼?段,通過執(zhí)?反饋迭代優(yōu)化?具庫,強化邏輯?致性和代碼效率Tools
of
Data
Interpreter代碼?成不可控-?動驗證問題:即便代碼?誤,也可能存在 邏輯缺陷,需有效驗證代碼輸出的 正確性分析:單純依賴異常檢測不能確保 任務(wù)完成,需對邏輯嚴(yán)密性進(jìn)?深?驗證解法:Data
Interpreter采?ACV (Automated
Confidence-based Verification)技術(shù),?成驗證代碼 模擬任務(wù)邏輯過程,依據(jù)執(zhí)?結(jié)果 計算置信分?jǐn)?shù),據(jù)此選擇最可靠答 案,有效避免邏輯錯誤,提?推理 準(zhǔn)確性以MATH內(nèi)的?個任務(wù)說明基于置信度?動驗證流程:虛線框內(nèi)是?動驗證的過程,虛線框下?根據(jù)驗證對多個候選答案進(jìn)?排序代碼?成不可控-?動驗證問題:即便代碼?誤,也可能存在 邏輯缺陷,需有效驗證代碼輸出的 正確性分析:單純依賴異常檢測不能確保 任務(wù)完成,需對邏輯嚴(yán)密性進(jìn)?深?驗證解法:DataInterpreter采?ACV技 術(shù),?成驗證代碼模擬任務(wù)邏輯過 程,依據(jù)執(zhí)?結(jié)果計算置信分?jǐn)?shù), 據(jù)此選擇最可靠答案,有效避免邏 輯錯誤,提?推理準(zhǔn)確性數(shù)學(xué)任務(wù)效果對?代碼?成不可控–經(jīng)驗池&?動De
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負(fù)責(zé)。
- 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。
最新文檔
- GB/T 12347-2025鋼絲繩疲勞試驗方法
- 2025年關(guān)于為淄博市檢察機關(guān)公開招聘聘用制書記員的備考題庫帶答案詳解
- 2026年醫(yī)療信息安全管理合同
- 2025年興業(yè)銀行濟(jì)南分行社會招聘備考題庫帶答案詳解
- 惠州市惠城區(qū)衛(wèi)生健康局2025年公開選聘醫(yī)療衛(wèi)生事業(yè)單位領(lǐng)導(dǎo)備考題庫及完整答案詳解一套
- 2025年永康市科學(xué)技術(shù)局工作人員招聘備考題庫及完整答案詳解一套
- 2025年首都醫(yī)科大學(xué)附屬北京朝陽醫(yī)院石景山醫(yī)院派遣合同制職工招聘備考題庫及1套參考答案詳解
- 2025年招商銀行佛山分行社會招聘備考題庫及參考答案詳解一套
- 2025年醫(yī)保系統(tǒng)年終工作總結(jié)
- 2026年高郵市衛(wèi)健系統(tǒng)事業(yè)單位公開招聘高層次人才備考題庫及一套答案詳解
- 林地除草合同范本
- 云南高中體育會考試題及答案
- 2025廣東惠州市城市建設(shè)投資集團(tuán)有限公司社會招聘9人備考筆試試題及答案解析
- 2025湖北武漢市公安局蔡甸區(qū)分局第二批招聘警務(wù)輔助人員43人考試筆試參考題庫及答案解析
- 軍事地形學(xué)圖課件
- 新生兒一例個案護(hù)理
- 2025年沈陽輔警招聘考試真題及一套參考答案詳解
- 花中四君子課件
- QC成果-提高組合幕墻鋁單板安裝一次施工合格率(詔安縣總醫(yī)院擴建項目QC小組)
- 設(shè)備維護(hù)保養(yǎng)方案及設(shè)備更新改造計劃
- 國網(wǎng)安全技術(shù)培訓(xùn)課件
評論
0/150
提交評論