人工智能危害的機制:從人工智能事件中吸取的教訓(xùn)(英文)_第1頁
人工智能危害的機制:從人工智能事件中吸取的教訓(xùn)(英文)_第2頁
人工智能危害的機制:從人工智能事件中吸取的教訓(xùn)(英文)_第3頁
人工智能危害的機制:從人工智能事件中吸取的教訓(xùn)(英文)_第4頁
人工智能危害的機制:從人工智能事件中吸取的教訓(xùn)(英文)_第5頁
已閱讀5頁,還剩58頁未讀, 繼續(xù)免費閱讀

下載本文檔

版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進(jìn)行舉報或認(rèn)領(lǐng)

文檔簡介

october2025

CenterforSecurityandEmergingTechnology|1

ExecutiveSummary

Withrecentadvancementsinartificialintelligence—particularly,powerfulgenerative

models—privateandpublicsectoractorshaveheraldedthebenefitsofincorporatingAImoreprominentlyintoourdailylives.Frequentlycitedbenefitsincludeincreased

productivity,efficiency,andpersonalization.However,theharmcausedbyAIremainstobemorefullyunderstood.AsaresultofwiderAIdeploymentanduse,thenumberofAIharmincidentshassurgedinrecentyears,suggestingthatcurrentapproachesto

harmpreventionmaybefallingshort.ThisreportarguesthatthisisduetoalimitedunderstandingofhowAIrisksmaterializeinpractice.LeveragingAIincidentreportsfromtheAIIncidentDatabase,itanalyzeshowAIdeploymentresultsinharmandidentifiessixkeymechanismsthatdescribethisprocess(Table1).

Table1:TheSixAIHarmMechanisms

IntentionalHarm

UnintentionalHarm

●Harmbydesign

●AImisuse

●AttacksonAIsystems

●AIfailures

●Failuresofhumanoversight

●Integrationharm

AreviewofAIincidentsassociatedwiththesemechanismsleadstoseveralkeytakeawaysthatshouldinformAIgovernanceapproachesinthefuture.

1.Aone-size-fits-allapproachtoharmpreventionwillfallshort.Thisreport

illustratesthediversepathwaystoAIharmandthewiderangeofactors

involved.Effectivemitigationrequiresanequallydiverseresponsestrategythatincludessociotechnicalapproaches.Adoptingmodel-basedapproachesalonecouldespeciallyneglectintegrationharmsandfailuresofhumanoversight.

2.Todate,riskofharmcorrelatesonlyweaklywithmodelcapabilities.This

reportillustratesmanyinstancesofharmthatimplicatesingle-purposeAI

systems.Yetmanypolicyapproachesusebroadmodelcapabilities,oftenproxiedbycomputingpower,asapredictorforthepropensitytodoharm.Thisfailsto

mitigatethesignificantriskassociatedwiththeirresponsibledesign,development,anddeploymentoflesspowerfulAIsystems.

3.TrackingAIincidentsoffersinvaluableinsightsintorealAIrisksandhelps

buildresponsecapacity.Technicalinnovation,experimentationwithnewusecases,andnovelattackstrategieswillresultinnewAIharmincidentsinthe

CenterforSecurityandEmergingTechnology|2

future.Keepingpacewiththesedevelopmentsrequiresrapidadaptationandagileresponses.ComprehensiveAIincidentreportingallowsforlearningandadaptationatanacceleratedpace,enablingimprovedmitigationstrategiesandidentificationofnovelAIrisksastheyemerge.Incidentreportingmustbe

recognizedasacriticalpolicytooltoaddressAIrisks.

CenterforSecurityandEmergingTechnology|3

TableofContents

ExecutiveSummary 1

Introduction 4

Methodology 6

Limitations 6

AIHarmMechanisms 9

IntentionalHarm 9

HarmbyDesign 9

AIMisuse 10

AttacksonAISystems 12

UnintentionalHarm 14

AIFailures 14

FailuresofHumanOversight 16

IntegrationHarm 19

Discussion 22

Conclusion 23

Appendix 25

Authors 27

Acknowledgments 27

Endnotes 28

CenterforSecurityandEmergingTechnology|4

Introduction

DuetowidespreadAIuseanddeployment,AIsystemsareincreasinglyimplicatedinharmfulevents.Justsincethebeginningof2025,279newincidentshavebeenaddedtotheAIIncidentDatabase(AIID),anonprofiteffortdedicatedtotrackingrealized

harmfromAIdeployment

.1

SinceitslaunchinNovember2020,thedatabasehascollectedandindexedmorethan1,200incidentsofharmornearmissesinvolvingalgorithmicsystemsandAI

.*

Clearly,moreeffortsareneededtopreventsuchAIharm.PreemptiveharmpreventionistheunderlyinggoalpursuedbymostAIgovernanceinterventions,beitregulations

liketheEuropeanUnion’sAIAct,executiveguidanceliketheOfficeofManagementandBudget’smemorandumM-25-21,orcompanyframeworkslikeAnthropic’sResponsibleScalingPolicy

.2

Preventingharmeffectively,however,requiresabetterunderstandingofhowAIuseleadstoharmfuloutcomesinpractice,ratherthanintheory.

AIincidentsdataprovidevaluableinsightsforunderstandinghowAIsystemscan

causerealharm.Bycollecting,indexing,andarchivingreportsfromhundredsofreal-worldAIincidents,theAIIDhascreatedatreasuretroveofdatadescribingnotonlythemyriadsofharmsAIsystemshavebeenimplicatedin,butalsohowtheseharmscametobe.CSETpreviouslyleveragedthisdatatocreateananalyticalframeworkthat

providesfundamentaldefinitionsandclassificationschemesforincidentdataanalysis

.3

Thisframeworkwasthenusedtoannotateandclassifymorethan200incidentsfromthedatabasebyincidenttype,harmcategory,andotherdimensions

.4

Thispastanalyticalworkservesasthefoundationforthisreport,whichdescribesthevarietyofforcesinvolvedinAIharm.Leveragingthemorethan200reviewedcasesofreal-worldharmfromthedatabase,itidentifiessixkey“mechanismsofharm,”whichcanbedividedintointentionalandunintentionalharm(Table2).

*EachincidentintheAIIDcorrespondstooneormoreinstancesofharm,sothetotalnumberofdiscreteharmeventscapturedinthedatabaseishigherthanthenumberofincidentIDs.WhilesomeincidentIDscorrespondtoasingledocumentedharminstance,otherscapturemedia-constructedaccountsthat

aggregaterelatedincidentsintoasinglenarrative.

CenterforSecurityandEmergingTechnology|5

Table2:TheSixAIHarmMechanisms

IntentionalHarm

UnintentionalHarm

●Harmbydesign:HarmcausedbyAI

●AIfailures:HarmcausedbyAI

systemsdesignedanddevelopedfor

errors,malfunctions,orbias

harmfulpurposes

●Failuresofhumanoversight:Harm

●AImisuse:UseofAIsystemsfor

resultingfromthefailureofhuman-

harmagainstthedevelopers’

machine-teams

intentions

●Integrationharm:Harmresultingas

●AttacksonAIsystems:Harm

anunintendedconsequenceof

resultingfromAIbehavioror

deploymentinagivencontext

(in)actioncausedbycyberattacks

ThesixmechanismscomprehensivelydescribethevariouspathwaystoharmfoundintheAIID.Assuch,theyprovidearicherunderstandingofhowAIrisksmaterializeinpractice,whichcanhelpguidemitigationstrategies.

TheseharmmechanismsmayhaveimmediatepolicyrelevanceforcompanieshopingtocomplywithregulationsliketheEuropeanUnion’sAIAct.TheEUrecentlyreleaseda

codeofpracticeforgeneral-purposeAI.Thisvoluntarycompliancetoolwasdevelopedtohelpprovidersofgeneral-purposeAImodelsadheretotheact’srequirements,andlaysoutacomprehensiveriskmanagementprocess.Underit,developersmustengageinriskmodeling,describedas“astructuredprocessaimedatspecifyingpathways

throughwhichasystemicriskstemmingfromamodelmightmaterialize.

”5

This

frameworkofharmmechanisms,builtonempiricalevidenceofreal-worldincidents,mayserveasausefulstartingpointforthisexercise.

Importantly,thesediversemechanismsneedtobeaddressedthroughanequallywiderangeofmitigationstrategies.Securitypracticesmayservetoalleviaterisksofmisuseandattack,butdonothingtoaddressintegrationharms.PerformancestandardsandtestingprotocolscanreduceAIfailuresbutwon’tmitigatelimitationsofhuman

oversight.TopreventAIharmeffectively,adiversetoolboxisrequired.

ThefollowingsectionspresentharmincidentsfromtheAIIDtoillustratethesixharmmechanismsanddeepenreaders’understandingofthevarietyofwaysinwhichAIcancauseharm.Whilethisreportdoesnotintendtoprovideacomprehensiveoverviewofmitigationtechniques,ithighlightsmeasuresthatcombatspecificmechanismswherepossible,anddescribeswheremoreresearchisneededtofindeffectivemitigationstoparticularchallenges.

CenterforSecurityandEmergingTechnology|6

Methodology

Analysisforthisreporttookplaceintwostages.Thefirststageinvolvedthe

identificationofthesixharmmechanismsbasedonin-depthstudyofAIincidents.In

previousresearch,CSETdevelopedastandardizedanalyticalframeworkforthestudyofAIharms

.6

Thedevelopmentofthisframeworkinvolvedaniterativeprocessof

incidentreviewandframeworkadaptationthroughwhichthekeyelementsrequiredfortheidentificationofreal-worldAIharm,theirbasicrelationalstructure,andtheir

definitionswereidentified.Acentralconceptualcomponentoftheframeworkisthe“chainofharm,”i.e.,theseriesofeventsbetweenAIdeploymentandtheincident

outcomethatleadstoharm.Thechainofharmservesanessentialfunctioninthe

identificationofAIharm:FortheretobeAIharm,therehastobeadirectlinkbetweenthebehavioroftheAIsystemandtheharmthatoccurred.

Theframeworkwasappliedtomorethan200incidentsfromtheAIID,whichwere

annotatedandclassifiedbyincidenttype,harmcategory,andothervariables

.7

This

analyticalprocessandthecorrespondingin-depthinvestigationofalargenumberofincidentsshowedthatthechainofharmwascharacterizedbyavarietyofforcesthatshapedhowincidentsunfolded:theharmmechanisms.Whileerrors—bothhumanandAI—playedarole,sodidintentionallyharmfulusesofAIsystems,andonoccasiontheintegrationofAIintoaspecificdeploymentcontextwasharmfulonitsown.Thesix

harmmechanismspresentedabovewerederivedfromtheanalysisoftheserepeatedpatternsofeventsinthechainsofharm.

Thesecondstageinvolvedvalidatingthederivedmechanismsbycategorizinga

randomsetof200incidentsfromtheAIID.Thiswasnecessaryfortworeasons.First,thesampleofincidentstowhichtheAIHarmFrameworkhadoriginallybeenappliedhadnotbeenrandomlyselected,andwasthusnotrepresentativeofincidentsinthe

AIIDatthetime.Secondly,thenumberofincidentsintheAIIDhadgrownbyover50%sincethebeginningofthefirststagein2023.Validationwasthereforeessentialto

ensurethemechanismsremainapplicableandrelevanttoanewersetofincidentsthatmoreaccuratelyreflectthecurrentleveloftechnologicalinnovation.TheresultofthisexerciseisshowninFigure1intheappendix.

Limitations

Allframeworksandmodelsarenecessarilyanabstractandsimplifiedrepresentationofreality.Intherealworld,harmmechanismsareoftennotasclear-cutastheyappearinthisreport,andseveralmechanismscanbeactivesimultaneously.AttacksonAI

systemsareoftencarriedouttoenablemisuse.Modelperformanceissuesandhuman

CenterforSecurityandEmergingTechnology|7

oversightfailuresoccuratthesametime.Andsystemsdesignedforharmcanfailandcauseunintendedorexcessiveharm.

Representingintentionalityasabinaryisusefultohelpdistinguishthedifferentactorsthatinterventionsneedtotarget.However,inrealityintentionalityisaspectrum.SomeincidentsoccurastrulyunintendedandunforeseeableconsequencesofAIuse,and

othersareobviouslyintentional.Butmanysitinbetween,resultingfromdevelopers’

anddeployers’negligencetoconsiderpotentialimpacts,orevenfromreckless

disregardofeasilyforeseeableharm.Thisfluiditycreatesedgecasesthatrenderthe

distinctionbetweenharmbydesignandintegrationharmdifficult.Judgingthesecaseswithcertaintywouldrequireinsightsintothedevelopers’anddeployers’decision-

makingandgovernanceprocessesthataregenerallynotavailable.Thus,unlessthereisevidencetothecontrary,thisreportreliesontheassumptionthatharmwas

unintentionalwhencategorizingedgecaseincidents.Futureworkmayaddressthislimitationthroughamoredisaggregatedrepresentationofintentionality.

Lackofinformationcansimilarlyimpedethedistinctionbetweenharmbydesignand

misuse.SinceoutsideobserversgenerallycannotdiscerntheunderlyingAImodelinasystemthatisinvolvedinharm,itisoftenunclearwhetheranexistingAImodelwas

misusedoronewasdesignedforthispurpose

.8

Distinguishingbetweenthetwo

categoriesisnonethelessworthwhile,becausemitigationrequiresdifferent

interventionsbasedonwhichactoralongtheAIvaluechainintendstodoharm;the

modeldeveloperprecipitatesharmbydesign,whereastheAIsystem’susercauses

harmfrommisuse

.*

Evenwhentheysharesomeoverlap,separationallowsustoidentifytheappropriatemitigationmeasurestoaddresseachmechanismmoreeffectively.

Finally,therearetwolimitationsofthedatasource.AlthoughtheAIIDrepresentsthemostcomprehensivecollectionofAIincidentsandharmstodate,thedistributionof

incidenttypesinthedatabasedoesnotnecessarilyreflecttheirprevalenceintherealworld.Sincethedatabasedependsonjournalisticreporting,itrepresentsthepracticesandbiasesofthemediaecosystem.Assuch,itoverrepresentsincidentsthatare

attention-grabbingorassociatedwithcurrentsocietaldebates.Thissuggeststhatless

*Distinguishingdevelopers,deployers,andusersisnotalwaysstraightforward,andsometimesoneentityoccupiesmultipleroles.Forexample,ChatGPTisanAIsystemthatisbothdevelopedanddeployedby

OpenAI.Individualsinteractingwiththechatbotaretheusers.InascenariowhereOpenAIbuildsa

customerservicechatbotontopoftheirlanguagemodelGPT-5forathirdparty(e.g.,aninsurance

company),thedeveloperisstillOpenAIbutthedeployeristheinsurancecompany,andthecompany’scustomersarethechatbot’susers.

CenterforSecurityandEmergingTechnology|8

spectacularmechanismslikeintegrationharmmightbeunderrepresentedcomparedtoinstancesofhumanmisuse,whichhavedrivenmuchofthesocietaldebaterecently--particularlywhereitrelatestogenerativeAIsystems.

Lastly,therearemanyharmsfromAIthatcannoteasilybecapturedinanincident

databasebecausetheydonotmaterializeindiscreteinstances.TheconsequencesofAIenergyconsumption,thedetrimentalimpactofAIoverrelianceoneducation,orthe

adverseeffectsofAIcompanionsonhumanrelationshipsarejustafewexamplesofpossibleharmsthatrarelypresentasindividualincidents.Whileworthyofanalysis,thesetypesofharmsarenotwithinthescopeofthisstudy.

CenterforSecurityandEmergingTechnology|9

AIHarmMechanisms

IntentionalHarm

HarmbyDesign

AIsystemsdesignedwiththeintentionofcausingharmrepresentthemost

straightforwardofthesixharmmechanisms.Inthiscase,thedeveloperdesignstheAIsystemtoperformaninherentlyharmfultaskortobeusedinharmfulways.

Developers’andusers’intentionstocauseharmaregenerallyalignedinincidentsofharmbydesign,thoughasthefollowingexamplesillustrate,typesofharmbydesignsystemscanvarywidely.

SomeAIsystemsdevelopedfordefenseandlawenforcement,suchasAI-enabled

intelligenceanalysisandbattlespacemanagementsystemsusedfortargetingor

autonomousweaponssystemswithAI-enablednavigation,computervision,or

terminalguidance,areobviousexamplesofharmbydesignsystems.NomalfunctionsormisuseneedtooccurforharmtomaterializewhentheseAIsystemsareused,sinceharmistheintendedoutcome,bothbydeveloperanddeployer.Militariesmay

appropriatelyusethesesystemsagainstlawfulcombatantswhendeployedin

accordancewiththelawofarmedconflict.RecentconflictsinvolvingUkraineandIsraelhavereportedlyseenAI-enabledsystemscapableofcausingharmdeployedin

combat

.9

Butharmbydesignisalsoprevalentoutsideoflawenforcementanddefensecontexts.Deepfakeappsthatallowuserstomaliciouslycreatenonconsensualintimateimagery(NCII)abound.TherearedozensofsuchincidentsrecordedintheAIID—fartoomany

todescribeindividually

.10

Theharmoverwhelminglyaffectswomenandgirls,andas

such,theseincidentsprovidetangibleevidenceofafast-growingformofgender-baseddigitalviolence

.11

WhilethisisnotanewproblemnorevenanewAIcapability(imagegeneratorshavebeenaroundsinceapproximately2017),incidentsinvolving

pornographiccontenthavesurgedsinceAI“nudify”appsandpornographicvideogeneratorshavebecomemorewidelyavailableonline

.12

Therearealsomoresubtleformsofintentionallyharmfulalgorithmicdesign.Online

marketplacessuchasNaver,Coupang,andAmazonIndiahavebeenaccusedof

engaginginunfaircompetitivepracticesthroughalgorithmicmanipulation

.13

The

companiesallegedlyriggedtherecommendersystemsandsearchalgorithmspoweringtheirplatformstofavortheirownproductsandbrands,boostingtheirmarketshareandcausingeconomicandfinancialharmtotheircompetitors.Exploitingtheirdominanceas

CenterforSecurityandEmergingTechnology|10

anonlinemarketplatformtopromotetheirownbrandasavendorviolatesantitrust

lawsand,giventheplatforms’wideonlinereachandcustomerbase,theoverallimpactandscaleofharmfromevenminormanipulationcanbesignificant.

MitigatingHarmbyDesign

Ingeneral,thechoiceofapproachtoaddressingharmbydesigndependsonwhethertheintendedharmisconsideredsociallyacceptableornecessary.

ProhibitingthedevelopmentanddeploymentofAIsystemsforcertainusecasescanbeaneffectivemeasureforusecasescausingunacceptableharms.

Incontextswhereharmbydesignisdeemedacceptable,suchasdefenseandlawenforcementfunctions,thegoalofgovernanceisnottopreventharmentirelybuttoreduceittowhatisnecessaryinaclearlydefinedandcontestableframework.Institutionalpolicycanhelpensuretheresponsibledeploymentofthetechnologyinordertopreventexcessiveharmandnegligentuseorabuse.Generally,

organizationsshouldestablishAIgovernanceprinciplesthatclearlydefinethe

circumstancesandconditionsunderwhichrelianceonautonomousorAI-powereddecisionsandactionsisacceptable.Asolidaccountabilityframeworkwithclearlyassignedrolesandresponsibilitiescanensurethatanydecision-makingthatleadstoharmistransparentandtractable.Institutionaloversightbodies,assuming

sufficientindependenceandtransparency,shouldthenbeauthorizedto

investigateandauditAI-supporteddecision-makingandactionstakenwhenviolationsofthosepoliciesandframeworksoccuroraresuspected.

AIMisuse

AIsystemsthathavenotbeendevelopedwiththeexplicitgoalofdoingharmcanstillbemisusedforthatpurpose.Comparedtotheharmbydesignmechanism,wheretheintenttoharmlieswithboththedeveloperanduser,incasesofAImisusetheintenttoharmlieswiththeuseroroperatoroftheAIsystemonly.NotethatAImodelscanalsobemisusedfornon-maliciouspurposes,suchasusingAItodohomework

.14

Whilethismaycauseuserstounintentionallyharmthemselves—forexamplebydetrimentally

affectingtheirownlearning—thissectionisonlyconcernedwithintentionallyharmful,maliciousmisuse

.15

Bothspecializedalgorithmicsystemsandgeneral-purposeAImodelsareproneto

maliciousmisuse,althoughincidentsinvolvingthelatteraremorecommonintheAIID.

CenterforSecurityandEmergingTechnology|11

General-purposeAI,includinglargelanguagemodelsandtext-to-imagegenerators,performmanydifferenttaskswell,whichmakesthemparticularlyeasytomisuseforarangeofpurposes.

Forexample,in2023,usersoftheonlineforum4chancreatedhatefulandviolentvoiceimpersonationsofcelebritiesusingElevenLabs’voicesynthesisAImodel

.16

More

recently,MicrosoftandOpenAIreportedonhowstate-sponsoredhackersfromNorthKorea,Iran,Russia,andChinahadmisusedChatGPTforphishingandsocial

engineeringattackstargetingdefense,cybersecurity,andcryptocurrencysectors

.17

OtherinvestigationsrevealedthatChatGPThadbeenmisusedbycybercriminalstocreatemalwareandothermalicioussoftware

.18

SpecializedAIsystems,whichgenerallyserveasingleparticularpurpose,canalsobeharmfullymisused.Rankingonlinesearchresultsprovidesatroublingexample.

MaliciousactorscanbeespeciallyeffectiveatexploitingsearchresultrankingwithAIsystemsthatexhibitfullorveryhighlevelsofautonomy,misusingthemtoachieve

harmful,nefariousoutcomes.

Forinstance,antisemiticonlinegroupstaggedimagesofportableovensonwheelswiththelabel“Jewishbabystroller”

.19

Asaresult,ifuserssearchedfortheterm“Jewish

babystroller”,Google’salgorithmrankedimagesoftheovensatthetopofthesearchresults.Thiswasadirectexploitationoftheimagesearchalgorithm’sfunctionality,

whichworksbymatchingthewordsinaquerytothewordsthatappearnexttoimagesonawebpage.Thestrategysucceededparticularlywellbecauseofa“datavoid”

relatedtothesearchterm:Becausetheproduct“Jewishbabystroller”doesn’tactuallyexist,theonlyresultsavailableweretheoffensiveimages,whichwerethenpromotedbythealgorithm

.20

Malicioususershavecarriedoutsimilarlycoordinatedactivitiestotriggercontent

moderationalgorithmsintoremovingmarginalizedcreators’socialmediaposts,atacticknownas“adversarialreporting.

”21

Becausecontentonsocialmediasitesissometimesautomaticallyremovedwhenasufficientlyhighnumberofusersflagapost,regardlessofwhetherornotthepostactuallyviolatesanypolicies,right-wingtrollshave

strategicallyreportedpostsbyinfluencersbelongingtominoritygroupsonTikTokinordertotriggertheplatform’scontentmoderationalgorithm

.22

EvenifTikTok’sappealandreviewprocessfindsthatthevideodidnotviolatecommunityguidelines,penaltiesbecomemoreseverethemorefrequentlyacreator’spostsareflagged,andcanrangefromcontentremovaltoaccountdeletion.Automatedcontentmoderationsystemsarethusexploitedtoeffectivelycensormarginalizedcommunitiesonline.

TheLimitsofTechnicalMitigationsofMisuseRisk

DevelopersofgenerativeAImodelscantakestepstocontrolmodeloutputssoastolimitthegenerationofharmfulcontent

.23

Risk-basedreleasestrategiesthat

restrictaccesstoparticularlycapableoradvancedmodelscanfurtherhelpaddressmisuserisks

.24

Assessingamodel’spropensityformisuserequiresevaluating

whetheritcanperformagivenmalicioustask(plausibility)and,ifso,howwellitcandoso(performance)

.25

Red-teaminghasemergedasapopularmethodto

uncovermisuseplausibilityacrossawiderangeofdomains,andtosurfacewhereadditionalsafeguardsareneeded

.26

Performanceassessmentsshouldinclude

benchmarkevaluationsandexperiments,andfocusonthemarginalutilityofthemodelcomparedtoexistingmodesfortaskexecution

.27

Puttinginplacecomprehensivesafeguardsisexceptionallychallengingsinceitisdifficultfordeveloperstoanticipateallpotential(mis)usesoftheirmodel.Most

importantly,suchinterventionsatthemodellevelwillnotnecessarilyprevent

misusewithoutdeterioratingmodelperformance—alsoknownastheMisuse-UseTradeoff

.28

BecauseAImodelslackthecontexttounderstandmaliciousintent,

guardrailsthatpreventthemfrom,forexample,writingphishingemailswilllikelystopthemfromwritingotheremailsaswell.Thesameholdstrueforwritingcode:Guardrailstopreventmalwaremayreducethequalityofinnocuouscomputer

programs.Atthecurrentstateoftheart,buildinganAIsystemthatcanneverbemisusedoftenmeansbuildingasystemthatisbarelyusefulfornon-malicious

purposes.WhilethereareotherstepsAIdevelopersanddeployerscantaketopreventmodelmisuse,technicalfixesalonewillnoteliminatemisuserisks.

AttacksonAISystems

HarmcanalsoresultfromcyberattacksonAIsystems

.*

Aswiththemisusemechanism,theAIsystemdevelopersanddeployersdonotintendharmhere.Instead,theharmfulintentionsliewiththeattackers.Thecybersecuritycommunitycategorizesattacksintothreegroups:confidentiality,integrity,andavailabilityattacks

.29

Confidentialityattacksaimtoextractsensitiveinformation,integrityattacksaimtocompromisethemodel’s

*AdversarialattacksonAImodelscanoccurateverystageoftheAIlifecycle.SincerelevantincidentsintheAIIDresultfromattacksondeployedsystems,thissectiononlycoverspost-deploymentattacks.

CenterforSecurityandEmergingTechnology|12

CenterforSecurityandEmergingTechnology|13

performance,andavailabilityattacksaimtohalttheoverallfunctioningofthemodel.

Lately,anemergingcategoryofattacksaimstocircumventgenerativemodels’

safeguards

.30

SuchexploitationsofAIsystems’securityvulnerabilitiescanpotentiallyleadtoharmfuloutcomes.Moreover,inadditiontothestandaloneharmstheycause,attacksonAIsystemscanenablemodelmisuse.

WhilethereisampleevidenceofthesecurityvulnerabilitiesofAIsystems,mostattacksrecordedintheAIIDstilloccurinexperimentalsettingsthatdonotleadtoreal-worldharm.Forexample,securityresearchershaveuncoveredvulnerabilitiesinGitHub

CopilotthatwouldenableattackerstomodifyCopilot’sresponsesorleakthe

developer’sdata(confidentialityandintegrityattacks)

.31

ExperimentsshowedthatflawsinTesla’sautopilotcouldbeexploitedtomakethecaraccelerateandveerintotheoncomingtrafficlane(anintegrityattack)

.32

Finally,aninvestigationfoundthatadivergenceattackonChatGPTcouldforcethesystemtoleaktrainingdata,includingpersonalidentifiableinformationsuchasphonenumbersandemailandphysical

addresses(aconfidentialityattack)

.33

HarmincidentsfromtheAIIDshowthatinpractice,attacksonAIsystemsareoften

carriedouttoevadegenerativeAImodelsafeguards.Thispractice,called“jailbreaking,”reliesonpromptinjectionattacksinwhichuserscomeupwithtextpromptsthatinducetheAImodeltobehaveinwaysthatviolateitspolicies

.34

Promptinjectionattacks

enableduserstoevadeChatGPT’sguardrailsshortlyafteritsreleaseinorderto

producediscriminatoryandviolentcontent,aswellasofferinstructionsonhowtocarryoutcriminalactivities

.35

Evenaftermodelshaveundergoneextensivesafetytestingandred-teaming,promptinjectionattacksremainapopularandeffectivetechniqueto

circumventguardrails.Hackersusinglargelanguagemodelsformalwarecreation

employdozensofpromptingstrategiesforvariou

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負(fù)責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

評論

0/150

提交評論