谷歌的安全AI智能體簡介 Googles Approach for Secure AI Agents An Introduction_第1頁
谷歌的安全AI智能體簡介 Googles Approach for Secure AI Agents An Introduction_第2頁
谷歌的安全AI智能體簡介 Googles Approach for Secure AI Agents An Introduction_第3頁
谷歌的安全AI智能體簡介 Googles Approach for Secure AI Agents An Introduction_第4頁
谷歌的安全AI智能體簡介 Googles Approach for Secure AI Agents An Introduction_第5頁
已閱讀5頁,還剩28頁未讀, 繼續(xù)免費閱讀

下載本文檔

版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進行舉報或認(rèn)領(lǐng)

文檔簡介

GoogleMay2025

Google’sApproachforSecureAIAgents:

AnIntroduction

SantiagoDíaz,ChristophKern,KaraOlive

01

02

03

04

05

06

Introduction:thepromiseandrisksofAIagents

SecuritychallengesofhowAIagentswork

KeyrisksassociatedwithAIagents

Coreprinciplesforagentsecurity

Google’sapproach:ahybriddefense-in-depth

Navigatingthefutureofagentssecurely

ThispaperispartofourongoingefforttoshareGoogle’sbestpracticesforbuildingsecureAIsystems.ReadmoreaboutGoogle’sSecureAI

Frameworkatsaif.google.

01–Introduction:ThepromiseandrisksofAIagents

Introduction:ThepromiseandrisksofAIagents

WeareenteringaneweradrivenbyAIagents—AIsystemsdesignedtoperceivetheirenvironment,makedecisions,andtakeautonomousactionstoachieveuser-definedgoals.UnlikestandardLargeLanguageModels(LLMs)thatprimarilygeneratecontent,agentsact.TheyleverageAIreasoningtointeractwithothersystemsandexecutetasks,rangingfromsimpleautomationlikecategorizingincomingservicerequeststocomplex,multi-stepplanningsuchasresearchingatopicacrossmultiplesources,summarizingthefindings,anddraftinganemailtoateam.

Thisincreasingcapabilityandautonomypromisessignificantvalue,poten-tiallyreshapinghowbusinessesoperateandindividualsinteractwithtechnology.TherapiddevelopmentofagentframeworkslikeGoogle’sAgentDevelopmentKit1andopensourcetoolssuchasLangChainsignalsamovetowardwidespreaddeployment,suggesting“fleets”ofagentsoper-atingatscaleratherthanjustisolatedinstances.Atthesametime,thepromiseofagentsintroducesuniqueandcriticalsecuritychallengesthatdemandexecutiveattention.

Keyrisks:Rogueactionsandsensitivedatadisclosure

TheverynatureofAIagentsintroducesnewrisksstemmingfromseveralinherentcharacteristics.TheunderlyingAImodelscanbeunpredictable,astheirnon-deterministicnaturemeanstheirbehaviorisn’talwaysrepeatableevenwiththesameinput.Complex,emergentbehaviorscanarisethatweren’texplic-itlyprogrammed.Higherlevelsofautonomyindecision-makingincreasethepotentialscopeandimpactoferrorsaswellaspotentialvulnerabilitiestomaliciousactors.Ensuringalignment—thatagentactionsrea-sonablymatchuserintent,especiallywheninterpretingambiguousinstructionsorprocessinguntrustedinputs—remainsasignificanthurdle.Finally,therearechallengesinmanagingagentidentityandprivilegeseffectively.

ThesefactorscreatetheneedforAgentSecurity,aspecializedfieldfocusedonmitigatingthenovelrisksthesesystemspresent.Theprimaryconcernsdemandingstrategicfocusarerogueactions(unintended,harmful,orpolicy-violatingactions)andsensitivedatadisclosure(unauthorizedrevelationofprivateinfor-mation).Afundamentaltensionexists:increasedagentautonomyandpower,whichdriveutility,correlatedirectlywithincreasedrisk.

Traditionalsecurityparadigmsaloneareinsufficient

SecuringAIagentsinvolvesachallengingtrade-off:enhancinganagent’sutilitythroughgreaterautonomyandcapabilityinherentlyincreasesthecomplexityofensuringitssafetyandsecurity.Traditionalsystemssecurityapproaches(suchasrestrictionsonagentactionsimplementedthroughclassicalsoftware)lackthecontextualawarenessneededforversatileagentsandcanoverlyrestrictutility.Conversely,purelyreason-ing-basedsecurity(relyingsolelyontheAImodel’sjudgment)isinsufficientbecausecurrentLLMsremainsusceptibletomanipulationslikepromptinjectionandcannotyetoffersufficientlyrobustguarantees.Neitherapproachissufficientinisolationtomanagethisdelicatebalancebetweenutilityandrisk.

1

https://google.github.io/adk-docs/

Google

01–Introduction:ThepromiseandrisksofAIagents

Ourpathforward:Ahybridapproach

Buildingonwell-establishedprinciplesofsecuresoftwareandsystemsdesign,andinalignmentwithGoogle’sSecureAIFramework(SAIF),2Googleisadvocatingforandimplementingahybridapproach,combiningthestrengthsofbothtraditional,deterministiccontrolsanddynamic,reasoning-baseddefenses.Thiscreatesalayeredsecurityposture—a“defense-in-depthapproach”3—thataimstoconstrainpotentialharmwhilepreservingmaximumutility.Thisstrategyisbuiltuponthreecoresecurityprinciplesdetailedlaterinthisdocument.

ThispaperfirstexplainsthetypicalworkflowofanAIagentanditsinherentsecuritytouchpoints.Itthenaddresseskeyrisksagentspose,introducescoresecurityprinciples,anddetailsGoogle’shybriddefense-in-depthstrategy.Throughout,guidingquestionsaresuggestedtohelpframeyourthinking.Aforthcoming,comprehensivewhitepaperwilldelvedeeperintothesetopics,offeringmoreextensivetechnicaldetailsandmitigations.

2

www.saif.google

3

https://google.github.io/building-secure-and-reliable-systems/raw/ch08.html#defense_in_depth

Google’sApproachforSecureAIAgents:AnIntroduction5

02–SecuritychallengesofhowAIagentswork

Google

SecuritychallengesofhowAIagentswork

Tounderstandtheuniquesecurityrisksofagents,it’shelpfultostartwithamentalmodelthatdescribesacommonagentarchitecture.Whiledetailsvary,thereareseveralbroadlyapplicableconcepts.Wewillbrieflydiscusseachandidentifytherisksthatapplytoeachcomponent.

OrchestrationAgentApplication

UserInteraction

Application

System

instructions

Userquerydetails

Rendering

Outputtransformation

Perception

Inputtransformation

Model(s)

ReasoningandplanningLLM

Reasoningcore

Model(s)

Dataprocessing

Agentmemory

Content(RAG)

Tools

Figure1:Asimplifiedconceptualagentarchitectureforvisualizingrelevantsecurityconsiderations

02–SecuritychallengesofhowAIagentswork

Google’sApproachforSecureAIAgents:AnIntroduction7

Input,perceptionandpersonalization:Agentsbeginbyreceivinginput.Thisinputcanbeadirectuserinstruction(typedcommand,voicequery)orcontextualdatagatheredfromtheenvironment(sensorread-ings,applicationstate,recentdocuments).Theinput,whichcanbemultimodal(text,image,audio),isprocessedandperceivedbytheagentandoftentransformedintoaformattheAImodelcanunderstand.

Securityimplication:Acriticalchallengehereisreliablydistinguishingtrustedusercommandsfrompotentiallyuntrustedcontextualdataandinputsfromothersources(forexample,contentwithinanemailorwebpage).Failuretodosoopensthedoortopromptinjectionattacks,wheremaliciousinstruc-tionshiddenindatacanhijacktheagent.Secureagentsmustcarefullyparseandseparatetheseinputstreams.Personalizationfeatures,whereagentslearnuserpreferences,alsoneedcontrolstopreventmanipulationordatacontaminationacrossusers.

Questionstoconsider

Whattypesofinputsdoestheagentprocess,andcanitclearlydistinguishtrusteduserinputsfrompotentiallyuntrustedcontextualinputs?

Doestheagentactimmediatelyinresponsetoinputsordoesitperformactionsasynchronouslywhentheusermaynotbepresenttoprovideoversight?

Istheuserabletoinspect,approve,andrevokepermissionsforagentactions,memory,andpersonalizationfeatures?

Ifanagenthasmultipleusers,howdoesitensureitknowswhichuserisgivinginstructions,applytherightpermissionsforthatuser,andkeepeachuser’smemoryisolated?

Systeminstructions:Theagent’scoremodeloperatesonacombinedinputintheformofastructuredprompt.Thispromptintegratespredefinedsysteminstructions(whichdefinetheagent’spurpose,capabil-ities,andboundaries)withthespecificuserqueryandvariousdatasourceslikeagentmemoryorexternallyretrievedinformation.

Securityimplication:Acrucialsecuritymeasureinvolvesclearlydelimitingandseparatingthesedif-ferentelementswithintheprompt.Maintaininganunambiguousdistinctionbetweentrustedsysteminstructionsandpotentiallyuntrusteduserdataorexternalcontentisimportantformitigatingpromptinjectionattacks.

Reasoningandplanning:Theprocessedinput,combinedwithsysteminstructionsdefiningtheagent’spur-poseandcapabilities,isfedintothecoreAImodel.Thismodelreasonsabouttheuser’sgoalanddevelopsaplan—oftenasequenceofstepsinvolvinginformationretrievalandtoolusage—toachieveit.Thisplanningcanbeiterative,refiningtheplanbasedonnewinformationortoolfeedback.

Securityimplication:BecauseLLMplanningisprobabilistic,it’sinherentlyunpredictableandpronetoerrorsfrommisinterpretation.Furthermore,currentLLMarchitecturesdonotproviderigoroussepara-tionbetweenconstituentpartsofaprompt(inparticular,systemanduserinstructionsversusexternal,untrustworthyinputs),makingthemsusceptibletomanipulationlikepromptinjection.Thecommonpracticeofiterativeplanning(ina“reasoningloop”)exacerbatesthisrisk:eachcycleintroducesoppor-tunitiesforflawedlogic,divergencefromintent,orhijackingbymaliciousdata,potentiallycompoundingissues.Consequently,agentswithhighautonomyundertakingcomplex,multi-stepiterativeplanningpresentasignificantlyhigherrisk,demandingrobustsecuritycontrols.

02–SecuritychallengesofhowAIagentswork

Google

Questionstoconsider

Howdoestheagenthandleambiguousinstructionsorconflictinggoals,andcanitrequestuserclarification?

Whatlevelofautonomydoestheagenthaveinplanningandselectingwhichplantoexecute,andarethereconstraintsonplancomplexityorlength?

Doestheagentrequireuserconfirmationbeforeexecutinghigh-riskorirreversibleactions?

Orchestrationandactionexecution(tooluse):Toexecuteitsplan,theagentinteractswithexternal

systemsorresourcesvia“tools”or“actions.”ThesecouldbethroughAPIsforsendingemails,querying

databases,accessingfilesystems,controllingsmartdevices,oreveninteractingwithwebbrowserelements.

Theagentselectstheappropriatetoolandprovidesthenecessaryparametersbasedonitsplan.

Securityimplication:Thisstageiswhererogueplanstranslateintoreal-worldimpact.Eachtoolgrantstheagentspecificpowers.Uncontrolledaccesstopowerfulactions(suchasdeletingfiles,makingpur-chases,transferringdata,andevenadjustingsettingsonmedicaldevices)ishighlyriskyiftheplanningphaseiscompromised.Secureorchestrationrequiresrobustauthenticationandauthorizationfortooluse,ensuringtheagenthasappropriatelyconstrainedpermissions(reducedprivilege)forthetaskathand.Dynamicallyincorporatingnewtools,especiallythird-partyones,introducesrisksrelatedtodeceptivetooldescriptionsorinsecureimplementations.

Questionstoconsider

Isthesetofavailableagentactionsclearlydefined,andcanuserseasilyinspectactions,under-standtheirimplications,andprovideconsent?

Howareactionswithpotentiallysevereconsequencesidentifiedandsubjectedtospecificcontrolsorconfinement?

Whatsafeguards(suchassandboxingpolicies,usercontrols,andsensitivedeploymentexclu-sions)preventagentactionsfromimproperlyexposinghigh-privilegeinformationorcapabilitiesinlow-privilegecontexts?

Agentmemory:Manyagentsmaintainsomeformofmemorytoretaincontextacrossinteractions,storelearneduserpreferences,orrememberfactsfromprevioustasks.

Securityimplication:Memorycanbecomeavectorforpersistentattacks.Ifmaliciousdatacontainingapromptinjectionisprocessedandstoredinmemory(forexample,asa“fact”summarizedfromamaliciousdocument),itcouldinfluencetheagent’sbehaviorinfuture,unrelatedinteractions.Memoryimplementationsmustensurestrictisolationbetweenusersandpotentiallybetweendifferentcon-textsforthesameusertopreventcontamination.Usersalsoneedtransparencyandcontroloveragentmemory.Understandingthesestageshighlightshowvulnerabilitiescanarisethroughouttheagent’soperationalcycle,necessitatingsecuritycontrolsateachcriticaljuncture.

02–SecuritychallengesofhowAIagentswork

Google’sApproachforSecureAIAgents:AnIntroduction9

Responserendering:Thisstagetakestheagent’sfinalgeneratedoutputandformatsitfordisplaywithintheuser’sapplicationinterfacesuchasawebbrowserormobileapp.

Securityimplication:Iftheapplicationrendersagentoutputwithoutpropersanitizationorescapingbasedoncontenttype,vulnerabilitieslikeCross-SiteScripting(XSS)ordataexfiltration(frommaliciouslycraftedURLsinimagetags,forexample)canoccur.Robustsanitizationbytherenderingcomponentiscrucial.

Questionstoconsider

Howisagentmemoryisolatedbetweendifferentusersandcontextstopreventdataleakageorcross-contamination?

Whatstopsstoredmaliciousinputs(likepromptinjections)fromcausingpersistentharm?

Whatsanitizationandescapingprocessesareappliedwhenrenderingagent-generatedoutputtopreventexecutionvulnerabilities(suchasXSS)?

Howisrenderedagentoutput,especiallygeneratedURLsorembeddedcontent,validatedtopreventsensitivedatadisclosure?

03–KeyrisksassociatedwithAIagents

Google

KeyrisksassociatedwithAIagents

Wethinktheinherentdesignofagents,combinedwiththeirpowerfulcapabilities,canexposeuserstotwomajorrisks,whatwecallrogueactionsandsensitivedatadisclosure.Thefollowingsectionexaminesthesetworisksandmethodsattackersusetorealizethem.

OrchestrationAgentApplication

Application

Perception

Inputtransformation

Rendering

Outputtransformation

System

instructions

Userquery

details

Reasoningcore

UserInteraction

2

1

Model(s)

ReasoningandplanningLLM

Model(s)

Dataprocessing

Content(RAG)

Agentmemory

1

2

Tools

Figure2:RisksassociatedwithAIagentsacrosstheagentarchitecture:Rogueactions(1)andSensitivedatadisclosure(2)

03–KeyrisksassociatedwithAIagents

Google’sApproachforSecureAIAgents:AnIntroduction,,

Risk1:Rogueactions

Rogueactions—unintended,harmful,orpolicy-violatingagentbehav-iors—representaprimarysecurityriskforAIagents.

Akeycauseispromptinjection:maliciousinstructionshiddenwithinprocesseddata(likefiles,emails,orwebsites)cantricktheagent’scoreAImodel,hijackingitsplanningorreasoningphases.Themodelmisinterpretsthisembeddeddataasinstructions,causingittoexecuteattackercommandsusingtheuser’sauthority.Forexample,anagentprocessingamaliciousemailmightbemanipulatedintoleakinguserdatainsteadofperformingtherequestedtask.

Rogueactionscanalsooccurwithoutmaliciousinput,stemminginsteadfromfundamentalmisalignmentormisinterpretation.Theagentmightmisunderstandambiguousinstructionsorcontext.Forinstance,anambiguousrequestlike“emailMikeabouttheprojectupdate”couldleadtheagenttoselectthewrongcon-tact,inadvertentlysharingsensitiveinformation.Suchcasesinvolveharmfuldivergencefromuserintentduetotheagent’sinterpretation,notexternalcompromise.

Additionally,unexpectednegativeoutcomescanariseiftheagentmisinterpretscomplexinteractionswithexternaltoolsorenvironments.Forexample,itmightmisinterpretthefunctionofbuttonsorformsonacomplexwebsite,leadingtoaccidentalpurchasesorunintendeddatasubmissionswhentryingtoexecuteaplannedaction.

Thepotentialimpactofanyrogueactionscalesdirectlywiththeagent’sauthorizedcapabilitiesandtoolaccess.Thepotentialforfinancialloss,databreaches,systemdisruption,reputationaldamage,andevenphysicalsafetyrisksescalatesdramaticallywiththesensitivityandreal-worldimpactoftheactionstheagentispermittedtotake.

Risk2:Sensitivedatadisclosure

Thiscriticalriskinvolvesanagentimproperlyrevealingprivateorcon-

fidentialinformation.Aprimarymethodforachievingsensitivedata

disclosureisdataexfiltration.Thisinvolvestrickingtheagentintomak-

ingsensitiveinformationvisibletoanattacker.Attackersoftenachieve

thisbyexploitingagentactionsandtheirsideeffects,typically

drivenbypromptinjection.Attackerscanmethodicallyguideanagent

throughasequenceofactions.Theymighttricktheagentintoretrieving

sensitivedataandthenleakingitthroughactions,suchasembedding

datainaURLtheagentispromptedtovisit,orhidingsecretsincode

commitmessages.

Alternatively,datacanbeleakedbymanipulatingtheagent,soutputgeneration.Anattackermighttricktheagentintoincludingsensitivedatadirectlyinitsresponse(liketextorMarkdown).Ifthisoutputisren-deredinsecurelybytheapplication(becauseitlacksappropriatevalidationorsanitizationfordisplayinabrowser,forexample),thedatacanbeexposed.ThiscanhappenthroughcraftedimageURLshiddeninMarkdownthatleakdatawhenfetched,forinstance.ThisvectorcanalsoleadtoCross-SiteScripting(XSS).

Theimpactofdatadisclosureissevere,potentiallyleadingtoprivacybreaches,intellectualpropertyloss,complianceviolations,orevenaccounttakeover,andthedamageisoftenirreversible.

Mitigatingthesediverseandpotentrisksrequiresadeliberate,multi-facetedsecuritystrategygroundedinclear,actionableprinciples.

04–Coreprinciplesforagentsecurity

Google

Coreprinciplesforagentsecurity

Tomitigatetherisksofagentswhilebenefitingfromtheirimmensepotential,weproposethatagenticproductdevelopersshouldadoptthreecoreprinciplesforagentsecurity.Foreachprinciple,werecommendcontrolsortechniquesforyoutoconsider.

OrchestrationAgentApplication

UserInteraction

1Application

Perception

Inputtransformation

2

Rendering

Outputtransformation

Userquerydetails

System

instructions

Reasoningcore

Model(s)

ReasoningandplanningLLM

2

Content(RAG)

Agentmemory

Model(s)

Dataprocessing

Tools

,.

>3

>3

Figure3:ControlsrelevanttoAIagents:Agentusercontrols(1),Agentpermissions(2),andAgentobservability(3)

04–Coreprinciplesforagentsecurity

Google’sApproachforSecureAIAgents:AnIntroduction13

Principle1:Agentsmusthavewell-definedhumancontrollers

Agentstypicallyactasproxiesorassistantsforhumans,inheritingprivilegestoaccessresourcesandperformactions.Therefore,itisessentialforsecurityandaccountabilitythatagentsoperateunderclearhumanoversight.Everyagentmusthaveawell-definedsetofcontrollinghumanuser(s).Thisprinciplemandatesthatsystemsmustbeabletoreliablydistinguishinstructionsoriginatingfromanautho-rizedcontrollinguserversusanyotherinput,especiallypotentiallyuntrusteddataprocessedbytheagent.Foractionsdeemedcriticalorirreversible—suchasdeletinglargeamountsofdata,authorizingsignif-icantfinancialtransactions,orchangingsecuritysettings—thesystemshouldrequireexplicithumanconfirmationbeforeproceeding,ensur-ingtheuserremainsintheloop.

Furthermore,scenariosinvolvingmultipleusersoragentsrequirecarefulconsideration.Agentsactingonbehalfofteamsorgroupsneeddistinctidentitiesandclearauthorizationmodelstopreventunauthorizedcross-userdataaccessoroneuserinadvertentlytriggeringactionsimpactinganother.Usersshouldbegiventhetoolstograntmoregranularpermissionswhentheagentisshared,comparedtothecoarse-grainedpermissionsthatmightbeappropriateforasingle-useragent.Similarly,ifagentconfigurationsorcustompromptscanbeshared,theprocessmustbetransparent,ensuringusersunderstandexactlyhowasharedconfigurationmightaltertheagent’sbehaviorandpotentialactions.

Controls:ThisprinciplereliesoneffectiveAgentUserControls,supportedbyinfrastructurethatprovidesdistinctagentidentitiesandsecureinputchannelstodifferentiateusercommands.

Principle2:Agentpowersmusthavelimitations

Anagent’spowers—theactionsitcantakeandtheresourcesitcanaccess—mustbecarefullylimitedinalignmentwithitsintendedpur-poseanditscontrollinguser’srisktolerance.Forexample,anagentdesignedforresearchshouldnotpossessthepowertomodifyfinancialaccounts.General-purposeagentsneedmechanismstodynamicallyconfinetheircapabilitiesatruntime,ensuringonlyrelevantpermissionsareactiveforanygivenquery(forexample,disallowingfiledeletionactionswhenthetaskiscreativewriting).

nl,

Thisprincipleextendstraditionalleastprivilegebyrequiringanagent’spermissionstobedynamicallyalignedwithitsspecificpurposeandcurrentuserintent,ratherthanjustbeingstaticallyminimized.Thisdistinctioniscrucialbecausemodel-basedagentsoperateinpotentiallyunboundedecosystemsandaresusceptibletoreasoningerrors,necessitatingcontext-awarelimitationsthatgobeyondthetypicalscopeconsideredforhumanusersortraditionalsoftware.

Thisprinciplenecessitatesthatmaximumpermissionlevelsmustbedefinableandenforceable.Critically,agentsmustbepreventedfromescalatingtheirownprivilegesbeyondexplicitlypre-authorizedscopes.Usersmustalsoretaintheabilitytoinspectandrevokeanyauthoritydelegatedtoanagent.ImplementingthisrequiresadaptingandstrengtheningfundamentalAuthentication,Authorization,andAuditing(AAA)infrastructureforagents,includingverifiableagentidentities,granularpermissionsystems,andsecureman-agementofcredentialslikescopedOAuthtokens.

04–Coreprinciplesforagentsecurity

Google

Controls:ImplementingthisprinciplerequiresdefinedAgentPermissionscontrols,enforcedbyrobustAuthentication,Authorization,andAuditing(AAA)infrastructureadaptedforagents,andutilizingscopedcredentialslikeOAuthtokenstomanageaccesssecurely.

Principle3:Agentactionsandplanningmustbeobservable

Trust,effectivedebugging,securityauditing,incidentresponse,anduserunderstandingallhingeontransparencyintotheagent’sactivities.Wecannotensureanagentisactingfaithfullyordiagnoseproblemsifitsoperationsareentirelyopaque.Therefore,agentactions,andwherefeasible,theirplanningprocesses,mustbeobservableandauditable.Thisrequiresimplementingrobustloggingacrosstheagent’sarchi-tecturetocapturecriticalinformationsuchasinputsreceived,toolsinvoked,parameterspassed,outputsgenerated,andideally,interme-diatereasoningsteps.Thisloggingmustbedonesecurely,protectingsensitivedatawithinthelogsthemselves.

Effectiveobservabilityalsomeansthatthepropertiesoftheactionsanagentcantake—suchaswhetheranactionisread-onlyversusstate-changing,orifithandlessensitivedata—mustbeclearlycharacterized.Thismetadataiscrucialforautomatedsecuritymechanismsandhumanreviewers.Finally,userinterfacesshouldbedesignedtopromotetransparency,providinguserswithinsightsintotheagent’s“thoughtprocess,”thedatasourcesitconsulted,ortheactionsitintendstotake,especiallyforcomplexorhigh-riskoperations.Thisrequiresinfrastructureinvestmentsinsecure,centralizedloggingsystemsandAPIsthatexposeactioncharacteristicsunderstandably.

Controls:EffectiveAgentObservabilitycontrolsarecrucial,necessitatinginfrastructureinvestmentsinsecure,centralizedloggingsystemsandstandardizedAPIsthatclearlycharacterizeactionpropertiesandpotentialsideeffects.

Thesethreeprinciplescollectivelyformastrategicframeworkformitigatingagentrisks.

Principle

1.Human

controllers

Summary

Ensuresaccountability,usercontrol,andpreventsagentsfromactingautonomouslyincriticalsituationswithoutclearhuman

oversightorattribution.

KeyControlFocus

Agentuser

controls

InfrastructureNeeds

Distinctagentidentities,

userconsentmechanisms,

secureinputs

2.Limitedpowers

Enforcesappropriate,dynamicallylimitedprivileges,ensuringagentshaveonlythe

capabilitiesandpermissionsnecessaryfortheirintendedpurposeandcannotescalateprivilegesinappropriately.

Agentpermissions

RobustAAAforagents,scopedcredentialmanagement,

sandboxing

3.Observableactions

Requirestransparencyandauditability

throughrobustloggingofinputs,reasoning,actions,andoutputs,enablingsecurity

decisionsanduserunderstanding.

Agent

observability

Secure/centralizedlogging,characterizedactionAPIs,transparentUX

Figure4:Asummaryof

agentsecurityprinciples,controls,andhigh-level

infrastructureneeds

05–Google’sapproach:Ahybriddefense-in-depth

Google’sApproachforSecureAIAgents:AnIntroduction15

Google’sapproach:Ahybriddefense-in-depth

GiventheinherentlimitationsofcurrentAImodelsandthepracticalimpossibilityofguaranteeingper-fectalignmentagainstallpotentialthreats,Googleemploysadefense-in-depthstrategycenteredaroundahybridapproach.Thisapproachstrategicallycombinestraditional,deterministicsecuritymeasureswithdynamic,reasoning-baseddefenses.Thegoalistocreaterobustboundariesaroundtheagentsoperationalenvironment,significantlymitigatingtheriskofharmfuloutcomes,particularlyrogueactionsstemmingfrompromptinjection,whilestrivingtopreservetheagentsutility.

Thisdefense-in-depthapproachreliesonenforcedboundariesaroundtheAIagentsoperationalenviron-menttopreventpotentialworst-casescenarios,actingasguardrailseveniftheagentsinternalreasoningprocessbecomescompromisedormisalignedbysophisticatedattacksorunexpectedinputs.Thismulti-layeredapproachrecognizesthatneitherpurelyrule-basedsystemsnorpurelyAI-basedjudgmentaresufficientontheirown.

Examplesofnew

vulnerabilities

Runtimepolicyenforcement

Dependableconstraintson

agentprivileges

Hardeningofthebasemodel,classifiers,andsafetyfine-tuning

Reasoning-baseddefenses

Regressiontesting

VariantAnalysis

Testingforregressions,variants,andnewvulnerabilities

RedTeams&HumanReviewers

AIAgent

Application

PerceptionRendering

Reasoningcore

Orchestration

Figure5:Googleshybri

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負(fù)責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

最新文檔

評論

0/150

提交評論