版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進(jìn)行舉報或認(rèn)領(lǐng)
文檔簡介
october2025
CenterforSecurityandEmergingTechnology|1
ExecutiveSummary
Withrecentadvancementsinartificialintelligence—particularly,powerfulgenerative
models—privateandpublicsectoractorshaveheraldedthebenefitsofincorporatingAImoreprominentlyintoourdailylives.Frequentlycitedbenefitsincludeincreased
productivity,efficiency,andpersonalization.However,theharmcausedbyAIremainstobemorefullyunderstood.AsaresultofwiderAIdeploymentanduse,thenumberofAIharmincidentshassurgedinrecentyears,suggestingthatcurrentapproachesto
harmpreventionmaybefallingshort.ThisreportarguesthatthisisduetoalimitedunderstandingofhowAIrisksmaterializeinpractice.LeveragingAIincidentreportsfromtheAIIncidentDatabase,itanalyzeshowAIdeploymentresultsinharmandidentifiessixkeymechanismsthatdescribethisprocess(Table1).
Table1:TheSixAIHarmMechanisms
IntentionalHarm
UnintentionalHarm
●Harmbydesign
●AImisuse
●AttacksonAIsystems
●AIfailures
●Failuresofhumanoversight
●Integrationharm
AreviewofAIincidentsassociatedwiththesemechanismsleadstoseveralkeytakeawaysthatshouldinformAIgovernanceapproachesinthefuture.
1.Aone-size-fits-allapproachtoharmpreventionwillfallshort.Thisreport
illustratesthediversepathwaystoAIharmandthewiderangeofactors
involved.Effectivemitigationrequiresanequallydiverseresponsestrategythatincludessociotechnicalapproaches.Adoptingmodel-basedapproachesalonecouldespeciallyneglectintegrationharmsandfailuresofhumanoversight.
2.Todate,riskofharmcorrelatesonlyweaklywithmodelcapabilities.This
reportillustratesmanyinstancesofharmthatimplicatesingle-purposeAI
systems.Yetmanypolicyapproachesusebroadmodelcapabilities,oftenproxiedbycomputingpower,asapredictorforthepropensitytodoharm.Thisfailsto
mitigatethesignificantriskassociatedwiththeirresponsibledesign,development,anddeploymentoflesspowerfulAIsystems.
3.TrackingAIincidentsoffersinvaluableinsightsintorealAIrisksandhelps
buildresponsecapacity.Technicalinnovation,experimentationwithnewusecases,andnovelattackstrategieswillresultinnewAIharmincidentsinthe
CenterforSecurityandEmergingTechnology|2
future.Keepingpacewiththesedevelopmentsrequiresrapidadaptationandagileresponses.ComprehensiveAIincidentreportingallowsforlearningandadaptationatanacceleratedpace,enablingimprovedmitigationstrategiesandidentificationofnovelAIrisksastheyemerge.Incidentreportingmustbe
recognizedasacriticalpolicytooltoaddressAIrisks.
CenterforSecurityandEmergingTechnology|3
TableofContents
ExecutiveSummary 1
Introduction 4
Methodology 6
Limitations 6
AIHarmMechanisms 9
IntentionalHarm 9
HarmbyDesign 9
AIMisuse 10
AttacksonAISystems 12
UnintentionalHarm 14
AIFailures 14
FailuresofHumanOversight 16
IntegrationHarm 19
Discussion 22
Conclusion 23
Appendix 25
Authors 27
Acknowledgments 27
Endnotes 28
CenterforSecurityandEmergingTechnology|4
Introduction
DuetowidespreadAIuseanddeployment,AIsystemsareincreasinglyimplicatedinharmfulevents.Justsincethebeginningof2025,279newincidentshavebeenaddedtotheAIIncidentDatabase(AIID),anonprofiteffortdedicatedtotrackingrealized
harmfromAIdeployment
.1
SinceitslaunchinNovember2020,thedatabasehascollectedandindexedmorethan1,200incidentsofharmornearmissesinvolvingalgorithmicsystemsandAI
.*
Clearly,moreeffortsareneededtopreventsuchAIharm.PreemptiveharmpreventionistheunderlyinggoalpursuedbymostAIgovernanceinterventions,beitregulations
liketheEuropeanUnion’sAIAct,executiveguidanceliketheOfficeofManagementandBudget’smemorandumM-25-21,orcompanyframeworkslikeAnthropic’sResponsibleScalingPolicy
.2
Preventingharmeffectively,however,requiresabetterunderstandingofhowAIuseleadstoharmfuloutcomesinpractice,ratherthanintheory.
AIincidentsdataprovidevaluableinsightsforunderstandinghowAIsystemscan
causerealharm.Bycollecting,indexing,andarchivingreportsfromhundredsofreal-worldAIincidents,theAIIDhascreatedatreasuretroveofdatadescribingnotonlythemyriadsofharmsAIsystemshavebeenimplicatedin,butalsohowtheseharmscametobe.CSETpreviouslyleveragedthisdatatocreateananalyticalframeworkthat
providesfundamentaldefinitionsandclassificationschemesforincidentdataanalysis
.3
Thisframeworkwasthenusedtoannotateandclassifymorethan200incidentsfromthedatabasebyincidenttype,harmcategory,andotherdimensions
.4
Thispastanalyticalworkservesasthefoundationforthisreport,whichdescribesthevarietyofforcesinvolvedinAIharm.Leveragingthemorethan200reviewedcasesofreal-worldharmfromthedatabase,itidentifiessixkey“mechanismsofharm,”whichcanbedividedintointentionalandunintentionalharm(Table2).
*EachincidentintheAIIDcorrespondstooneormoreinstancesofharm,sothetotalnumberofdiscreteharmeventscapturedinthedatabaseishigherthanthenumberofincidentIDs.WhilesomeincidentIDscorrespondtoasingledocumentedharminstance,otherscapturemedia-constructedaccountsthat
aggregaterelatedincidentsintoasinglenarrative.
CenterforSecurityandEmergingTechnology|5
Table2:TheSixAIHarmMechanisms
IntentionalHarm
UnintentionalHarm
●Harmbydesign:HarmcausedbyAI
●AIfailures:HarmcausedbyAI
systemsdesignedanddevelopedfor
errors,malfunctions,orbias
harmfulpurposes
●Failuresofhumanoversight:Harm
●AImisuse:UseofAIsystemsfor
resultingfromthefailureofhuman-
harmagainstthedevelopers’
machine-teams
intentions
●Integrationharm:Harmresultingas
●AttacksonAIsystems:Harm
anunintendedconsequenceof
resultingfromAIbehavioror
deploymentinagivencontext
(in)actioncausedbycyberattacks
ThesixmechanismscomprehensivelydescribethevariouspathwaystoharmfoundintheAIID.Assuch,theyprovidearicherunderstandingofhowAIrisksmaterializeinpractice,whichcanhelpguidemitigationstrategies.
TheseharmmechanismsmayhaveimmediatepolicyrelevanceforcompanieshopingtocomplywithregulationsliketheEuropeanUnion’sAIAct.TheEUrecentlyreleaseda
codeofpracticeforgeneral-purposeAI.Thisvoluntarycompliancetoolwasdevelopedtohelpprovidersofgeneral-purposeAImodelsadheretotheact’srequirements,andlaysoutacomprehensiveriskmanagementprocess.Underit,developersmustengageinriskmodeling,describedas“astructuredprocessaimedatspecifyingpathways
throughwhichasystemicriskstemmingfromamodelmightmaterialize.
”5
This
frameworkofharmmechanisms,builtonempiricalevidenceofreal-worldincidents,mayserveasausefulstartingpointforthisexercise.
Importantly,thesediversemechanismsneedtobeaddressedthroughanequallywiderangeofmitigationstrategies.Securitypracticesmayservetoalleviaterisksofmisuseandattack,butdonothingtoaddressintegrationharms.PerformancestandardsandtestingprotocolscanreduceAIfailuresbutwon’tmitigatelimitationsofhuman
oversight.TopreventAIharmeffectively,adiversetoolboxisrequired.
ThefollowingsectionspresentharmincidentsfromtheAIIDtoillustratethesixharmmechanismsanddeepenreaders’understandingofthevarietyofwaysinwhichAIcancauseharm.Whilethisreportdoesnotintendtoprovideacomprehensiveoverviewofmitigationtechniques,ithighlightsmeasuresthatcombatspecificmechanismswherepossible,anddescribeswheremoreresearchisneededtofindeffectivemitigationstoparticularchallenges.
CenterforSecurityandEmergingTechnology|6
Methodology
Analysisforthisreporttookplaceintwostages.Thefirststageinvolvedthe
identificationofthesixharmmechanismsbasedonin-depthstudyofAIincidents.In
previousresearch,CSETdevelopedastandardizedanalyticalframeworkforthestudyofAIharms
.6
Thedevelopmentofthisframeworkinvolvedaniterativeprocessof
incidentreviewandframeworkadaptationthroughwhichthekeyelementsrequiredfortheidentificationofreal-worldAIharm,theirbasicrelationalstructure,andtheir
definitionswereidentified.Acentralconceptualcomponentoftheframeworkisthe“chainofharm,”i.e.,theseriesofeventsbetweenAIdeploymentandtheincident
outcomethatleadstoharm.Thechainofharmservesanessentialfunctioninthe
identificationofAIharm:FortheretobeAIharm,therehastobeadirectlinkbetweenthebehavioroftheAIsystemandtheharmthatoccurred.
Theframeworkwasappliedtomorethan200incidentsfromtheAIID,whichwere
annotatedandclassifiedbyincidenttype,harmcategory,andothervariables
.7
This
analyticalprocessandthecorrespondingin-depthinvestigationofalargenumberofincidentsshowedthatthechainofharmwascharacterizedbyavarietyofforcesthatshapedhowincidentsunfolded:theharmmechanisms.Whileerrors—bothhumanandAI—playedarole,sodidintentionallyharmfulusesofAIsystems,andonoccasiontheintegrationofAIintoaspecificdeploymentcontextwasharmfulonitsown.Thesix
harmmechanismspresentedabovewerederivedfromtheanalysisoftheserepeatedpatternsofeventsinthechainsofharm.
Thesecondstageinvolvedvalidatingthederivedmechanismsbycategorizinga
randomsetof200incidentsfromtheAIID.Thiswasnecessaryfortworeasons.First,thesampleofincidentstowhichtheAIHarmFrameworkhadoriginallybeenappliedhadnotbeenrandomlyselected,andwasthusnotrepresentativeofincidentsinthe
AIIDatthetime.Secondly,thenumberofincidentsintheAIIDhadgrownbyover50%sincethebeginningofthefirststagein2023.Validationwasthereforeessentialto
ensurethemechanismsremainapplicableandrelevanttoanewersetofincidentsthatmoreaccuratelyreflectthecurrentleveloftechnologicalinnovation.TheresultofthisexerciseisshowninFigure1intheappendix.
Limitations
Allframeworksandmodelsarenecessarilyanabstractandsimplifiedrepresentationofreality.Intherealworld,harmmechanismsareoftennotasclear-cutastheyappearinthisreport,andseveralmechanismscanbeactivesimultaneously.AttacksonAI
systemsareoftencarriedouttoenablemisuse.Modelperformanceissuesandhuman
CenterforSecurityandEmergingTechnology|7
oversightfailuresoccuratthesametime.Andsystemsdesignedforharmcanfailandcauseunintendedorexcessiveharm.
Representingintentionalityasabinaryisusefultohelpdistinguishthedifferentactorsthatinterventionsneedtotarget.However,inrealityintentionalityisaspectrum.SomeincidentsoccurastrulyunintendedandunforeseeableconsequencesofAIuse,and
othersareobviouslyintentional.Butmanysitinbetween,resultingfromdevelopers’
anddeployers’negligencetoconsiderpotentialimpacts,orevenfromreckless
disregardofeasilyforeseeableharm.Thisfluiditycreatesedgecasesthatrenderthe
distinctionbetweenharmbydesignandintegrationharmdifficult.Judgingthesecaseswithcertaintywouldrequireinsightsintothedevelopers’anddeployers’decision-
makingandgovernanceprocessesthataregenerallynotavailable.Thus,unlessthereisevidencetothecontrary,thisreportreliesontheassumptionthatharmwas
unintentionalwhencategorizingedgecaseincidents.Futureworkmayaddressthislimitationthroughamoredisaggregatedrepresentationofintentionality.
Lackofinformationcansimilarlyimpedethedistinctionbetweenharmbydesignand
misuse.SinceoutsideobserversgenerallycannotdiscerntheunderlyingAImodelinasystemthatisinvolvedinharm,itisoftenunclearwhetheranexistingAImodelwas
misusedoronewasdesignedforthispurpose
.8
Distinguishingbetweenthetwo
categoriesisnonethelessworthwhile,becausemitigationrequiresdifferent
interventionsbasedonwhichactoralongtheAIvaluechainintendstodoharm;the
modeldeveloperprecipitatesharmbydesign,whereastheAIsystem’susercauses
harmfrommisuse
.*
Evenwhentheysharesomeoverlap,separationallowsustoidentifytheappropriatemitigationmeasurestoaddresseachmechanismmoreeffectively.
Finally,therearetwolimitationsofthedatasource.AlthoughtheAIIDrepresentsthemostcomprehensivecollectionofAIincidentsandharmstodate,thedistributionof
incidenttypesinthedatabasedoesnotnecessarilyreflecttheirprevalenceintherealworld.Sincethedatabasedependsonjournalisticreporting,itrepresentsthepracticesandbiasesofthemediaecosystem.Assuch,itoverrepresentsincidentsthatare
attention-grabbingorassociatedwithcurrentsocietaldebates.Thissuggeststhatless
*Distinguishingdevelopers,deployers,andusersisnotalwaysstraightforward,andsometimesoneentityoccupiesmultipleroles.Forexample,ChatGPTisanAIsystemthatisbothdevelopedanddeployedby
OpenAI.Individualsinteractingwiththechatbotaretheusers.InascenariowhereOpenAIbuildsa
customerservicechatbotontopoftheirlanguagemodelGPT-5forathirdparty(e.g.,aninsurance
company),thedeveloperisstillOpenAIbutthedeployeristheinsurancecompany,andthecompany’scustomersarethechatbot’susers.
CenterforSecurityandEmergingTechnology|8
spectacularmechanismslikeintegrationharmmightbeunderrepresentedcomparedtoinstancesofhumanmisuse,whichhavedrivenmuchofthesocietaldebaterecently--particularlywhereitrelatestogenerativeAIsystems.
Lastly,therearemanyharmsfromAIthatcannoteasilybecapturedinanincident
databasebecausetheydonotmaterializeindiscreteinstances.TheconsequencesofAIenergyconsumption,thedetrimentalimpactofAIoverrelianceoneducation,orthe
adverseeffectsofAIcompanionsonhumanrelationshipsarejustafewexamplesofpossibleharmsthatrarelypresentasindividualincidents.Whileworthyofanalysis,thesetypesofharmsarenotwithinthescopeofthisstudy.
CenterforSecurityandEmergingTechnology|9
AIHarmMechanisms
IntentionalHarm
HarmbyDesign
AIsystemsdesignedwiththeintentionofcausingharmrepresentthemost
straightforwardofthesixharmmechanisms.Inthiscase,thedeveloperdesignstheAIsystemtoperformaninherentlyharmfultaskortobeusedinharmfulways.
Developers’andusers’intentionstocauseharmaregenerallyalignedinincidentsofharmbydesign,thoughasthefollowingexamplesillustrate,typesofharmbydesignsystemscanvarywidely.
SomeAIsystemsdevelopedfordefenseandlawenforcement,suchasAI-enabled
intelligenceanalysisandbattlespacemanagementsystemsusedfortargetingor
autonomousweaponssystemswithAI-enablednavigation,computervision,or
terminalguidance,areobviousexamplesofharmbydesignsystems.NomalfunctionsormisuseneedtooccurforharmtomaterializewhentheseAIsystemsareused,sinceharmistheintendedoutcome,bothbydeveloperanddeployer.Militariesmay
appropriatelyusethesesystemsagainstlawfulcombatantswhendeployedin
accordancewiththelawofarmedconflict.RecentconflictsinvolvingUkraineandIsraelhavereportedlyseenAI-enabledsystemscapableofcausingharmdeployedin
combat
.9
Butharmbydesignisalsoprevalentoutsideoflawenforcementanddefensecontexts.Deepfakeappsthatallowuserstomaliciouslycreatenonconsensualintimateimagery(NCII)abound.TherearedozensofsuchincidentsrecordedintheAIID—fartoomany
todescribeindividually
.10
Theharmoverwhelminglyaffectswomenandgirls,andas
such,theseincidentsprovidetangibleevidenceofafast-growingformofgender-baseddigitalviolence
.11
WhilethisisnotanewproblemnorevenanewAIcapability(imagegeneratorshavebeenaroundsinceapproximately2017),incidentsinvolving
pornographiccontenthavesurgedsinceAI“nudify”appsandpornographicvideogeneratorshavebecomemorewidelyavailableonline
.12
Therearealsomoresubtleformsofintentionallyharmfulalgorithmicdesign.Online
marketplacessuchasNaver,Coupang,andAmazonIndiahavebeenaccusedof
engaginginunfaircompetitivepracticesthroughalgorithmicmanipulation
.13
The
companiesallegedlyriggedtherecommendersystemsandsearchalgorithmspoweringtheirplatformstofavortheirownproductsandbrands,boostingtheirmarketshareandcausingeconomicandfinancialharmtotheircompetitors.Exploitingtheirdominanceas
CenterforSecurityandEmergingTechnology|10
anonlinemarketplatformtopromotetheirownbrandasavendorviolatesantitrust
lawsand,giventheplatforms’wideonlinereachandcustomerbase,theoverallimpactandscaleofharmfromevenminormanipulationcanbesignificant.
MitigatingHarmbyDesign
Ingeneral,thechoiceofapproachtoaddressingharmbydesigndependsonwhethertheintendedharmisconsideredsociallyacceptableornecessary.
ProhibitingthedevelopmentanddeploymentofAIsystemsforcertainusecasescanbeaneffectivemeasureforusecasescausingunacceptableharms.
Incontextswhereharmbydesignisdeemedacceptable,suchasdefenseandlawenforcementfunctions,thegoalofgovernanceisnottopreventharmentirelybuttoreduceittowhatisnecessaryinaclearlydefinedandcontestableframework.Institutionalpolicycanhelpensuretheresponsibledeploymentofthetechnologyinordertopreventexcessiveharmandnegligentuseorabuse.Generally,
organizationsshouldestablishAIgovernanceprinciplesthatclearlydefinethe
circumstancesandconditionsunderwhichrelianceonautonomousorAI-powereddecisionsandactionsisacceptable.Asolidaccountabilityframeworkwithclearlyassignedrolesandresponsibilitiescanensurethatanydecision-makingthatleadstoharmistransparentandtractable.Institutionaloversightbodies,assuming
sufficientindependenceandtransparency,shouldthenbeauthorizedto
investigateandauditAI-supporteddecision-makingandactionstakenwhenviolationsofthosepoliciesandframeworksoccuroraresuspected.
AIMisuse
AIsystemsthathavenotbeendevelopedwiththeexplicitgoalofdoingharmcanstillbemisusedforthatpurpose.Comparedtotheharmbydesignmechanism,wheretheintenttoharmlieswithboththedeveloperanduser,incasesofAImisusetheintenttoharmlieswiththeuseroroperatoroftheAIsystemonly.NotethatAImodelscanalsobemisusedfornon-maliciouspurposes,suchasusingAItodohomework
.14
Whilethismaycauseuserstounintentionallyharmthemselves—forexamplebydetrimentally
affectingtheirownlearning—thissectionisonlyconcernedwithintentionallyharmful,maliciousmisuse
.15
Bothspecializedalgorithmicsystemsandgeneral-purposeAImodelsareproneto
maliciousmisuse,althoughincidentsinvolvingthelatteraremorecommonintheAIID.
CenterforSecurityandEmergingTechnology|11
General-purposeAI,includinglargelanguagemodelsandtext-to-imagegenerators,performmanydifferenttaskswell,whichmakesthemparticularlyeasytomisuseforarangeofpurposes.
Forexample,in2023,usersoftheonlineforum4chancreatedhatefulandviolentvoiceimpersonationsofcelebritiesusingElevenLabs’voicesynthesisAImodel
.16
More
recently,MicrosoftandOpenAIreportedonhowstate-sponsoredhackersfromNorthKorea,Iran,Russia,andChinahadmisusedChatGPTforphishingandsocial
engineeringattackstargetingdefense,cybersecurity,andcryptocurrencysectors
.17
OtherinvestigationsrevealedthatChatGPThadbeenmisusedbycybercriminalstocreatemalwareandothermalicioussoftware
.18
SpecializedAIsystems,whichgenerallyserveasingleparticularpurpose,canalsobeharmfullymisused.Rankingonlinesearchresultsprovidesatroublingexample.
MaliciousactorscanbeespeciallyeffectiveatexploitingsearchresultrankingwithAIsystemsthatexhibitfullorveryhighlevelsofautonomy,misusingthemtoachieve
harmful,nefariousoutcomes.
Forinstance,antisemiticonlinegroupstaggedimagesofportableovensonwheelswiththelabel“Jewishbabystroller”
.19
Asaresult,ifuserssearchedfortheterm“Jewish
babystroller”,Google’salgorithmrankedimagesoftheovensatthetopofthesearchresults.Thiswasadirectexploitationoftheimagesearchalgorithm’sfunctionality,
whichworksbymatchingthewordsinaquerytothewordsthatappearnexttoimagesonawebpage.Thestrategysucceededparticularlywellbecauseofa“datavoid”
relatedtothesearchterm:Becausetheproduct“Jewishbabystroller”doesn’tactuallyexist,theonlyresultsavailableweretheoffensiveimages,whichwerethenpromotedbythealgorithm
.20
Malicioususershavecarriedoutsimilarlycoordinatedactivitiestotriggercontent
moderationalgorithmsintoremovingmarginalizedcreators’socialmediaposts,atacticknownas“adversarialreporting.
”21
Becausecontentonsocialmediasitesissometimesautomaticallyremovedwhenasufficientlyhighnumberofusersflagapost,regardlessofwhetherornotthepostactuallyviolatesanypolicies,right-wingtrollshave
strategicallyreportedpostsbyinfluencersbelongingtominoritygroupsonTikTokinordertotriggertheplatform’scontentmoderationalgorithm
.22
EvenifTikTok’sappealandreviewprocessfindsthatthevideodidnotviolatecommunityguidelines,penaltiesbecomemoreseverethemorefrequentlyacreator’spostsareflagged,andcanrangefromcontentremovaltoaccountdeletion.Automatedcontentmoderationsystemsarethusexploitedtoeffectivelycensormarginalizedcommunitiesonline.
TheLimitsofTechnicalMitigationsofMisuseRisk
DevelopersofgenerativeAImodelscantakestepstocontrolmodeloutputssoastolimitthegenerationofharmfulcontent
.23
Risk-basedreleasestrategiesthat
restrictaccesstoparticularlycapableoradvancedmodelscanfurtherhelpaddressmisuserisks
.24
Assessingamodel’spropensityformisuserequiresevaluating
whetheritcanperformagivenmalicioustask(plausibility)and,ifso,howwellitcandoso(performance)
.25
Red-teaminghasemergedasapopularmethodto
uncovermisuseplausibilityacrossawiderangeofdomains,andtosurfacewhereadditionalsafeguardsareneeded
.26
Performanceassessmentsshouldinclude
benchmarkevaluationsandexperiments,andfocusonthemarginalutilityofthemodelcomparedtoexistingmodesfortaskexecution
.27
Puttinginplacecomprehensivesafeguardsisexceptionallychallengingsinceitisdifficultfordeveloperstoanticipateallpotential(mis)usesoftheirmodel.Most
importantly,suchinterventionsatthemodellevelwillnotnecessarilyprevent
misusewithoutdeterioratingmodelperformance—alsoknownastheMisuse-UseTradeoff
.28
BecauseAImodelslackthecontexttounderstandmaliciousintent,
guardrailsthatpreventthemfrom,forexample,writingphishingemailswilllikelystopthemfromwritingotheremailsaswell.Thesameholdstrueforwritingcode:Guardrailstopreventmalwaremayreducethequalityofinnocuouscomputer
programs.Atthecurrentstateoftheart,buildinganAIsystemthatcanneverbemisusedoftenmeansbuildingasystemthatisbarelyusefulfornon-malicious
purposes.WhilethereareotherstepsAIdevelopersanddeployerscantaketopreventmodelmisuse,technicalfixesalonewillnoteliminatemisuserisks.
AttacksonAISystems
HarmcanalsoresultfromcyberattacksonAIsystems
.*
Aswiththemisusemechanism,theAIsystemdevelopersanddeployersdonotintendharmhere.Instead,theharmfulintentionsliewiththeattackers.Thecybersecuritycommunitycategorizesattacksintothreegroups:confidentiality,integrity,andavailabilityattacks
.29
Confidentialityattacksaimtoextractsensitiveinformation,integrityattacksaimtocompromisethemodel’s
*AdversarialattacksonAImodelscanoccurateverystageoftheAIlifecycle.SincerelevantincidentsintheAIIDresultfromattacksondeployedsystems,thissectiononlycoverspost-deploymentattacks.
CenterforSecurityandEmergingTechnology|12
CenterforSecurityandEmergingTechnology|13
performance,andavailabilityattacksaimtohalttheoverallfunctioningofthemodel.
Lately,anemergingcategoryofattacksaimstocircumventgenerativemodels’
safeguards
.30
SuchexploitationsofAIsystems’securityvulnerabilitiescanpotentiallyleadtoharmfuloutcomes.Moreover,inadditiontothestandaloneharmstheycause,attacksonAIsystemscanenablemodelmisuse.
WhilethereisampleevidenceofthesecurityvulnerabilitiesofAIsystems,mostattacksrecordedintheAIIDstilloccurinexperimentalsettingsthatdonotleadtoreal-worldharm.Forexample,securityresearchershaveuncoveredvulnerabilitiesinGitHub
CopilotthatwouldenableattackerstomodifyCopilot’sresponsesorleakthe
developer’sdata(confidentialityandintegrityattacks)
.31
ExperimentsshowedthatflawsinTesla’sautopilotcouldbeexploitedtomakethecaraccelerateandveerintotheoncomingtrafficlane(anintegrityattack)
.32
Finally,aninvestigationfoundthatadivergenceattackonChatGPTcouldforcethesystemtoleaktrainingdata,includingpersonalidentifiableinformationsuchasphonenumbersandemailandphysical
addresses(aconfidentialityattack)
.33
HarmincidentsfromtheAIIDshowthatinpractice,attacksonAIsystemsareoften
carriedouttoevadegenerativeAImodelsafeguards.Thispractice,called“jailbreaking,”reliesonpromptinjectionattacksinwhichuserscomeupwithtextpromptsthatinducetheAImodeltobehaveinwaysthatviolateitspolicies
.34
Promptinjectionattacks
enableduserstoevadeChatGPT’sguardrailsshortlyafteritsreleaseinorderto
producediscriminatoryandviolentcontent,aswellasofferinstructionsonhowtocarryoutcriminalactivities
.35
Evenaftermodelshaveundergoneextensivesafetytestingandred-teaming,promptinjectionattacksremainapopularandeffectivetechniqueto
circumventguardrails.Hackersusinglargelanguagemodelsformalwarecreation
employdozensofpromptingstrategiesforvariou
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負(fù)責(zé)。
- 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。
最新文檔
- 吉首市2024湖南湘西吉首市事業(yè)單位引進(jìn)急需緊缺人才35人筆試歷年參考題庫典型考點附帶答案詳解(3卷合一)
- 零售業(yè)財務(wù)管理崗位面試題及答案
- 病理科醫(yī)生職業(yè)資格考試復(fù)習(xí)資料含答案
- 采礦工程師資格認(rèn)證考試重點突破含答案
- 鹽業(yè)集團研發(fā)中心主任的面試題集
- 工程造價師考試重點難點解析
- 2025年城市綠地系統(tǒng)規(guī)劃提升可行性研究報告
- 2025年多功能能源站研發(fā)項目可行性研究報告
- 2025年自駕游營地建設(shè)項目可行性研究報告
- 2025年環(huán)保家居產(chǎn)品設(shè)計項目可行性研究報告
- JG/T 11-2009鋼網(wǎng)架焊接空心球節(jié)點
- 北師大版八年級數(shù)學(xué)上冊全冊同步練習(xí)
- 制造業(yè)數(shù)字化轉(zhuǎn)型公共服務(wù)平臺可行性研究報告
- 社工月度工作總結(jié)
- 氫能與燃料電池技術(shù) 課件 5-燃料電池
- 法醫(yī)學(xué)試題庫(含答案)
- 【課件】臺灣的社區(qū)總體營造
- 我的家鄉(xiāng)商洛
- 重慶市兩江新區(qū)2023-2024學(xué)年五年級上學(xué)期英語期末試卷
- 科學(xué)實驗知識講座模板
- 婚介服務(wù)機構(gòu)合作協(xié)議書
評論
0/150
提交評論