版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進行舉報或認領(lǐng)
文檔簡介
Deloitte.
Togethermakesprogress
Thepivottotokenomics:NavigatingAI’snew
spenddynamics
Thepivottotokenomics:NavigatingAI’snewspenddynamics
2
Anotefromtheauthors:
AIeconomicsaffectmost
organizationsandtheC-suiteuniquely.
Thispaperguidesthosefamiliar
withAItokensinmakingstrategic
choices.Ifyou,rejustbeginningyour
explorationoftokenomics,lookfor
additionalresearchsoon.
Thepivottotokenomics:NavigatingAI,snewspenddynamics
Traditionaltotal-cost-of-ownershipframeworksmisstherealityofAI
Volatileworkloads,newinfrastructuredemands,andtokensasthe
practicalunitofcost
Acrossindustries,GenerativeAI(GenAI)hasbecomethefastest-
growinglineiteminmostcorporatetechnologybudgets—already
consuminguptohalfofITspendinsomefirms.1Cloudbillsarerisingnearly20%yearoveryear,drivenbyAIworkloads.2Atthesametime,geopoliticaluncertaintiesareintensifyingcallsfordatasovereigntyandtechnologyinfrastructureindependence,makingmany
enterprisesthinkaboutAIsovereigntyandgaininggreatercontrolovertheirinfrastructure.3ThisisnolongeraCIOoperationalissue;itisaCFO-and-boardcapitalquestionabouthowtoresponsiblymanageaninvestmentofthisscaleandvolatility.
Unlikepriortechnologywavesgovernedbylicensesorvirtual
machines,AIspendoftenscalesinnonlinearandunpredictableways.AIcapabilitiesrunontokens:smallchunksofdata—text,imageoraudio—thatAIsystemsprocessintraining,inference,andreasoning.EveryAIinteractionconsumestokens,andeverytokencarriesacost.
ThecomplexityofAI’seconomicshideswithinthesetokens.
Costsrisenotonlywithuseradoptionbutwithworkloaddesign,
algorithmiccomplexity,andinfrastructureintensity.Whatexactlyarethethresholdstomoveacrossdifferentconsumptionchoices?Itdependsontheorganization.RoughlyaquarterofrespondentsinaDeloitte2025survey4ofdatacenterandpowerexecutivessaytheyortheirclientsarereadytomakethemoveoffofcloudtoalternativesassoonascostsreachjust26%to50%ofthosealternatives,showinghighsensitivitytoevenmodestpricechanges,whileothersplantowaituntilcloudcostsexceed150%ofthe
costofalternatives.ThedecisionpointremainsuncleargiventhehighvariabilitypatternsofAItechnologies.Forexample,advancedreasoningmodelsthatkeepcontextacrossmultiplestepscanconsumemuchmorecomputethanbasicone-shotresponses.
AsNVIDIAprojectsabillion-foldsurgeinAIcomputingandGooglenowprocesses1.3quadrilliontokensamonth5—a130-foldleapinjustayear—thecapitalandenergyimplicationsareprofound.
Traditionaltotalcostofownership(TCO)approachesarenolongerthebestwaytomanageAIeconomics.Leadersmaybebetterservedbyprecisioneconomics—theabilitytotrack,predict,andoptimizespendatthetokenlevel.Tokenstranslateopaqueinfrastructurechoicesintotangiblefinancialterms:thetruecostofgeneratingadollarofrevenue,margin,orproductivity.
ThecompetitivedividewillnotlikelyhingeonwhoadoptsAIfirst,butonwhomanagesitscoststructurewithdiscipline.AIspendwilllikelyseparatevaluecreatorsfromvalueeroders.Theformerconvert
tokensintomeasurableenterpriseoutput;thelatteraccumulateungovernedcostthatcompoundsquietlyacrossthestack.
3
Thepivottotokenomics:NavigatingAI,snewspenddynamics
4
TheelusiveAIROI
Despiterisinginvestment,manyleadersappeartostillbechasingmeasurablereturnoninvestment(ROI)fromAIinitiatives.
?Nearlyhalf(45%)of500leaders
surveyedinDeloitte’s2025US
TechValuesurvey
expectitwilltakeuptothreeyearstoseereturnon
investmentfrombasicAIautomation.6
?Sixin10ofthosecompletingDeloitte’s
2025TechValuesurveybelievemoreadvancedAIautomationwilltakeevenlongertoreachROI.
?Ofthe1,326globalfinanceleaders
surveyedforDeloitteGlobal’sinaugural
FinanceTrendsreport
,fieldedMay
2025,28%saidAIinvestmentsaredeliveringclear,measurablevalue.7
Buttheissueisn’twhetherAIwilldelivervalue—it’showtomeasureandmanagethatvalueinawayROIframeworkscannot.Formanyorganizations,adoptingAIisnolongeroptional;it’sastrategicresponsetocompetitiveorexistentialpressure.
ThatmakesunderstandingtheeconomicsofAI—howcosts,workloads
andreturnsflowthroughtokens—thenew
imperativeforleaders.
Thepivottotokenomics:NavigatingAI,snewspenddynamics
5
Tokens:ThenewcurrencyofAI
Unliketraditionalpricingbasedoncomputetime—whichisrelativelystatic—token-basedpricingtiescostdirectlytotheactualworkAIperforms.Eachtokenrepresentsbothaunitofcomputationandaunitofcost.Inthatsense,tokensarethetruecurrencyofAIeconomics—asindispensabletomachineintelligenceaskilowatthoursareto
electricity.Thedifferenceisthattokendemandisfarhardertopredictorcontrol,makingAIspendinherentlyvolatile.
?Nonlineardemand:Complexreasoningmodelsimproveperformancebutcanconsumemoretokensthansimpleinferencetasks.
?Fluctuatingtokenuse:Tokenusefluctuateswithexperimentationlevels,workloaddesign,modelchoiceandevenpromptengineering.
?Varyingpricing:TokenpricekeepschangingbasedonAImodelcapabilitiesandtheefficiencyoftheunderlyinginfrastructure.8
Whilethisvolatilityappearstostemfromusagepatterns,itsrootsareinthetechstack.Thecompute,storage,andnetworkingdecisionsthatpowerAImodelsdeterminehowefficientlytokensareprocessed—andhowcostlyeachonebecomes.
Atokenisnotjustatechnicalmeasure—itisaneconomicsignal.Eachtokencarries
thecompoundeffectofGPUdesign,storage,throughput,networklatency,andfacilityeconomics.Thedisciplineliesintracinglineage—frominfrastructuretotheAImodeltooutcome—andaligningthosedecisionssotokencostsstayproportionaltobusinessvalue.
Thepivottotokenomics:NavigatingAI,snewspenddynamics
6
Howtokensarebought
AIspendingisnotasinglemarket;itfracturesintodifferenteconomicrealitiesdependingonhoworganizationsconsumeintelligence.SomeleadersexperienceAIcostsonlyasasoftware-as-a-service(SaaS)lineitem,othersasmeteredapplicationprogramming
interface(API)calls,andagrowinggroup/cohortmanageitdirectlythroughinfrastructureownership—balancingGPUs,storage,networking,andenergy.
Buyingpatterns
?Generatingthroughpackagedsoftwareabstractstokensalmostentirely.Leadersseeapredictablesubscriptionorper-seatfee,butlittletransparencyintotokenconsumptionefficiency.Theriskislesscontrolformoresimplicity.
?ConsumingthroughAPIsmakestokensexplicit.Everyqueryismetered,
billed,andexposed.Thisbringstransparency,butalsovolatility:Costsrise
basedonworkloaddesign,promptlength,andhiddenchoicesofinfrastructureproviders.Costsgoupduetoatokenmeterrunninginrealtime.
?Runningonownedinfrastructurebringstokeneconomicsfullyin-house.
TokensbecometheoutcomeofdecisionsaboutGPUs,storagetiers,networking,andenergycontracts.Thisapproachdemandshighcapitalandtechnical
capabilitybutoffersthegreatestcontroloverlong-termcoststructureanddatasovereignty.Theemergingshorthandforthisstrategy:theAIfactory.
Eachofthesechoicesisgroundedinexistingandfuturetechnicalandoperatingdecisionsgivensystemcost,latency,security,andotherneeds,whichchangehowtokensflowintoenterpriseprofitandloss(P&L).9
Thepivottotokenomics:NavigatingAI,snewspenddynamics
7
WhatisanAIfactory,andwhendoesonemakesense?
DeloittedefinesanAIfactoryasaspecialized
infrastructure(compute,network,andstorage)along
withoptimizedsoftwareandservicesthatenablestheentireAIlifecycleathighperformancescale.Theprimaryproductisintelligence,measuredbytokenthroughput,whichdrivesdecisions,automation,andnewAIsolutions.
Oneofthehardestdecisionsenterprisesfaceiswhethertocontinuepayingfortokensoffpremises(off-prem)—throughAPIsortraditionalSaaScompanies—ortobuildanAIfactoryandself-managetheinfrastructure.The
economicsvarysharplydependingonscale,sensitivity,andpredictabilityofdemand:
?Off-prem(APIortraditionalSaaS
companies):Maybemostefficientforearlypilots,spikyorseasonalworkloads,orusecaseswithlowdatasensitivity.Costsare
typicallyhigherpertokenbutpredictableandflexible,withnoup-frontcapital
expense(capex).
?AIfactory:Canbecomeattractivewhen
workloadsarelarge,predictable,latency-
sensitive,andcrossathresholdwhere
buildingandoperatinginfrastructuredeliverslowereffectivetokeneconomicsthan
continuingtorentthem.Althoughcapex
investmentmaybeneeded,per-tokencostsfallasinfrastructureisfullyutilized,and
sovereigntyrisksarecontrolled.Beyond
thetraditionalon-premises(on-prem)or
colocation(co-lo)providers,anAIfactory
canalsobestoodupusingfast-growingcloudalternatives(neoclouds)tomanageworkloadredistributiontrends,as
detailedina
recentDeloittesurvey
.10
Thedecisionisnotbinary.Formostglobalenterprises,therealityishybrid.Smaller,lesspredictableand
exploratoryworkloadsmaystayinAPIform,while
scaled,highvalueworkloadsmayrunonanAIfactoryasapplicationsscaleandeconomicsstabilize.AImodelpreferenceandselectionmayalsodriveenterprise
decisionmaking.
Howtokensarepriced
Onceleadersunderstandtheirbuyertype(generate,consume,
run),thenextchallengeistoseehowtokensarepriced.ThesameAImodelcouldbebilledasaseatlicense,oratokenmeterorGPU-hours,dependingonhowitisconsumed.Therearethreemajor
constituentstotokenpricing:
1.Theunderlyingtechstack
2.Howitishostedandconsumed
3.WhattypeofAImodelandlevelofcustomizationisrequiredtopowerthesolution
TheAItechstack
EverytokenprocessedbyanAImodelreflectsacascadeofinfrastructuredecisions.
Forpackagedbuyers,inmostcasesandatleastfornow,thesecostsarehidden.Costsareabstracted,bundledintofamiliarenterprisecontractsandvendormanagedacrosseverylayerofthetech
stack,whichmakesunpackingTCOchallenging.
ForAPIconsumers,everyelementoftheAItechstackshowsup
indirectlyasper-tokenfeesorthroughputcharges.PricevariesbyAImodelaccessed,withdifferentinputandoutputrates,usually
reportedintokenpermillion.Discountedpricingoptionssuchas
reservedtokencapacity,promptcaching,orbatchexecutionratesareusuallyoffered,whileinsomecasesenterprisecustomersmayalsogetuser-basedpricing.Additionally,storageoregresschargesmayfurtheraddtoTCO.
Forself-hostedsolutions,tokensarenotpurchasedatall;theyemergefromexplicitcapexandoperatingexpense(opex)decisionsrelatedtoinfrastructurechoices(figures1and2).
Whatchangesacrossbuyertypesisnotwhetherthesecostsexist—theyalwaysdo—butwhoseesthem,controlsthem,andpaysforthem.
Thepivottotokenomics:NavigatingAI,snewspenddynamics
8
Figure1.HowtechnicaldecisionscandrivetokencostsandimplicationsforanAIfactory
STACKCOMPONENT
TOKENIMPLICATIONS
SELF-HOSTEDAIFACTORY
Compute
Graphicalprocessingunits(GPUs)and
accelerators
ModernGPUsandhigh-bandwidthmemoryshortentimepertokenbutcomewithhigheracquisitionor
rentalcost.
Largestdirectcost
Directinfrastructurespend
Rapidreleasecycles
Storage
High-speed
dataaccess
AIworkloadsstreamterabytesusingnonvolatilememoryandparallelfilesystemstosustain
performanceandmanagecost.Legacystorage
inflatesper-tokencostsbyaddinglatencyasGPUswaitfordata.
Nonvolatilememory,parallelfilesystems,vectordatabases
Heavyinvestment
Networking
GPUInterconnects
(InfiniBand,NVLink,PCIeGen5)
TrainingacrossthousandsofGPUsrequiresultra-low-latencyinterconnectstocutidlecyclesandlowercostpertoken,whiletraditionalapproachesoftendrive
tokencostshigher.
Directspend
PowerandcoolingEnergyintensity
ofAIracks
Asinglenext-generationGPUrackcandrawbetween250–300kW,comparedwith10–15kWfornon-
AIservers.Whetherbilleddirectly(on-prem)or
embeddedincloudpricing,thispoweruseshowsupineverytokenconsumed.
Highopex(250–300kWracks)
Liquidcoolingrequirements
Facilities
Physical
infrastructure
requirements
Heavierracks(upto3,000lb,11nearly40%morethantraditional),mayneedreinforcedflooring
andadvancedcoolingtobeembeddedinthecostofeverytoken.
Directcapex(reinforcedfloors,racks)
Operationalcosts
Relatedtostaffingandoperations:
?ITopsandmanagement
?Softwareandlicensing
?Applicationdevelopmentandintegration
?Datamanagementandgovernance
?Inferenceandserving
?Securityandcompliance
?Usertrainingandchangemanagement
Fullmachinelearningoperations(MLOps)costs
Fullcenterofexcellence(COE)andupskilling
OrchestrationframeworksandMLOpstools(data,orchestration,security)
Directcompliancespend,etc.
Source:Deloitteanalysisbasedonprojectexperience
Thepivottotokenomics:NavigatingAI,snewspenddynamics
9
Hostingmodels
HowtokensarepricedalsodependsonwhereandhowAImodelsarehosted.Thesamelargelanguagemodel(LLM)canbedeployedviaon-prem,colocation,hyperscalers,orAPIaccess,withradicallydifferenteconomics.Forapackagebuyer,thisdecisionisagaininvisibleandresideswiththevendor.FortheAPIconsumer,itcanvarybasedonwhichofthemanymodelsonthemarketisbeingconsumed,andthisexplainswhythesametaskmaycost
moredependingontheprovider.Forself-hostedAIinfrastructureusers,allhostingtypesarepossible,anditisoftenthemostimportantdeterminantofuniteconomics.
Thepivottotokenomics:NavigatingAI,snewspenddynamics
10
Figure2.GPUconsumptionmodelsandcoststructure
ON-PREM
NEOCLOUDPROVIDERS
HYPERSCALER
APIACCESS
Capexvs.opex
Highcapex/lowopex
Pureopex
Pureopex
Pureopex
Unitcost
ofcompute
(GPU/hour)
Lowest
~$1$2
Medium
~$1–$4average,buthighvariability,ondemand
High
~$3$7,
region/modeldependent
Veryhigh
$0.40$100ormorepermillionoutputtokens
Scalability
Medium
Slowduetoprocurement,power,andsetup
High
Dynamicresourceprovisioning
Medium/high
Dynamicscalingwithnear-infinitetop-end
Veryhigh
100%managedbytheprovider
Latency
Lowest
Fullcontrolover
hardwarestack
Low
Purpose-builtforAI,
butphysicallayout
notcontrollable;with
neoclouds,lowphysicalproximityismanageable
Medium
Near-zerocontroloverphysicallayerand
workloadplacement
Medium/high
Nocontroloverproviderinfrastructure/network,withlong-distance
communication
Controlandcustomization
Full
Medium
Nocontroloverphysicallayerormaintenance;highcontroloverwhatshosted
Medium
Treatedidenticallytoneocloudproviders
Verylow
Nocontrolover
infrastructurelayerandlimitedcontroloverAImodeltuning,formatofresponse
Security
anddata
sovereignty
Highest
Completecontroloverdataencryption,transit,storage
High
Treatedidenticallytoco-lo;neocloudsofferhigher
dataencryption
Medium
Dataleakageriskandlowcontroloverexacthostinglocation
Low
Nocontrolover
providerarchitectureorgovernancepractices
Deploymenttime
Long
Multi-monthprocurement,delivery,andsetup
Instant
Instant
Instant
Maintenance
responsibility
Customer
Managedservicesandsharedresponsibilitymodel(e.g.,facilities,energy,etc.)
Shared
Physicalinfrastructure:provider;allotherlayers:customer
Shared
Physicalinfrastructure:provider;allotherlayers:customer
AImodelprovider
Bestusecases
Stable,high-
throughputworkloads
Elasticcompute,
proofsofconcept(POCs),cost-sensitive
workloads;neocloudsmaybringadded
functionalityfordata-sensitiveworkloads
Elasticcompute,POCs
Fastexperimentation,agents,retrieval-
augmentedstorage(RAG)
Source:Deloitteanalysisbasedonpublicandproprietaryestimations,includingpubliclyavailableGPUpricingdata,APIpricingbenchmarks,and
hyperscalercostcalculatorreferences.IndicativereferencesincludepublicGPUcostanalysisandtotal-cost-of-ownershipmodels(e.g.,semi-analysisAITCOframework);publicAPIpricingbenchmarksforGenerativeAImodels(e.g.,representativeGPT-5familyrates);hyperscalercomputepricing
estimatesderivedfromstandardcloudcostcalculators
Thepivottotokenomics:NavigatingAI,snewspenddynamics
Ultimately,thecoststructurefollowsthearchitecture.Compute
density,networkproximity,andstoragethroughputeachinfluencehowefficientlytokensareprocessed—andtherefore,wherea
modelshouldlive.Thedecisionisn’taboutspeedorpreference;it’saboutmatchingworkloadphysicstobusinesseconomics.Inourexperience,we’vefoundhybridarchitecturessustainperformancewithoutinflatingtokencosts.
AImodelselection
AImodelstrategyisaseconddecisionpoint:open-sourceor
closedAImodels(proprietary).Packagebuyersinheritwhateverthevendorbuilds.APIuserscanchooseprovidersbutnotthemodels’economics.Onlyself-hostedAIfactoryuserscontrolthefulltrade-offacrosscost,flexibility,andsovereignty.12
Open-sourceAImodels
Open-sourcemodelsaregenerallyfreeandtypicallyrunin
self-hostedenvironments,givingenterprisesgreatercontrol,
customization,anddatasovereignty.Theyarewellsuitedforfine-tuningonproprietaryorsensitivedata,minimizingvendorlock-in,andloweringtokencostsovertime.
ExamplesincludeMetaLlama,Mistral,andothers.Emerging
frameworkssuchasNVIDIANIMMicroservicesillustratehow
vendorsarepackagingopen-sourcemodelsintostandardized,
securedeploymentunits—bringingoperationaldisciplinetowhatwasoncebespokeintegrationwork.
Proprietary(closed)AImodels
Theseareconsume-as-you-go,typicallybilledpertokenandallowuserstoquicklyhitthegroundwithnoup-frontinvestment,arepretrained,havestrongout-of-the-boxfunctionality,andenableaccesstovendorsupportforoperationalsupport.Examplesof
suchAImodelsincludeAnthropicClaude,GoogleGemini,OpenAIGPTs,xAIGrok,andothers.However,thistypicallycomeswith
higherper-tokencost,lowercostpredictabilityduetofluctuatingtokenusage,lackofcustomization,openconcernarounddata
storage,andriskofvendorlock-in.
11
Thepivottotokenomics:NavigatingAI,snewspenddynamics
DecodingtheAIcostcurve
AIeconomicsfollowJevons’paradox:Asefficiencyimproves,totalconsumptionrises.13Tokenpricesarefallingfast—whatoncecostdollarsperthousandnowcostspenniespermillion—andDeloitteprojectstheaverageinferencecostwilldropfrom$0.04permilliontokensin2025toabout$0.01by2030.14
Yetenterprisespendingcontinuestosurge.15Asagenticsystemsandmultiagentworkflowsproliferate,tokendemandgrowsexponentially—oftenfasterthaninfrastructureefficiencygainscanoffset.Theparadoxisn’tthatAIisbecomingcheaper;it’sthatefficiencyitselfisdrivingexpansion.Withoutdisciplinedcostgovernance,totalcostsgrow.
Whopaysthebill?
Thecostcurvedoesn’taffecteveryparticipantthesameway.Astokenconsumption
accelerates,thequestionbecomeswhoultimatelyabsorbsthatspend—theenterprise,thevendor,ortheenduser—andhowthosedynamicsevolveasworkloadsscaleandgrowmorecomplex.Deloitte,sTCOanalysisexaminesexactlywhereandwhenthosecostsshift.
12
Thepivottotokenomics:NavigatingAI,snewspenddynamics
ThetokenTCOestimationandscenarioanalysis
Toquantifythesedynamics,DeloitteconductedadetailedtokenTCOanalysisdesignedtocapturehowAI’sunderlyingeconomicsshiftacrossthefulltechstack.Theanalysistested
howtotalcostofownershipevolvesalongthreecriticaldimensionsthatshapetokenpricing:
1.Technologystack:TheGPUs,AImodels,andarchitecturespoweringAIworkloads.
2.Hostingapproach:Comparisonsasusageandcomplexityscaleovertime.
3.Usagescaling:Increaseintheoveralltokenconsumptiondrivenbyincreaseinusercountorthecomplexity/depthofreasoningeachusecasedemands.
TheobjectivewastounderstandhowthesefactorsinteracttoredefineorganizationalstrategybasedonwhatthekeydriversofAITCOare,howcostsevolveasusage
scales,andwheretheinflectionpointsemergeincostpertoken.Beforepresentingtheoutcomes,thenextsectionoutlinesthekeyassumptionsandconfigurations
underpinningthemodelusedinourtests.
13
Thepivottotokenomics:NavigatingAI,snewspenddynamics
Modelassumptions
Themodelwasbuilttotestrealistic,enterprise-scaleconditionsratherthanidealized
labsettings.16Whileitcanaccommodateawiderangeofconfigurations,theversion
summarizedherereflectsarepresentativescenarioacrosscommonenterpriseworkloads.
Thebaselineconfigurationincluded:
?Computestack:NVIDIAHGXB200GPUServer(NVLink/NVSwitchEnabled)|CPU–AMDEPYC9654.
?LLM:Llama3.370BFP8TP2,GPT-4oselectedbecauseavarietyofcommonconfigurationswerebeingtested.
?Hostingmodels:On-prem,APIaccess,specializedneocloudproviders
(NCPs).NCPsofferhourlyratesaswellasreservedcontractingfordifferentperiods.Inthismodel,weassumedhourlyandnotreservedpricing.
ThissetupenabledDeloittetoisolatehowhostingchoices,AImodelselection,andusage
maturityinteracttodrivetokenconsumptionandtotalcost.Thefollowinganalysishighlightstheresultingcostcurvesandinflectionpointsthatemergeasusagescales.Theanalysis
simulatesgrowthscalinginincrementsof8GPUs(figure3).
Figure3.Scenariocomplexityandtokenassumptionsdrivingfour-yearTCOdynamics
TOKENSCENARIOS
EXAMPLESCENARIODESCRIPTION/USECASE
YEAR1
Pilotstage
InitialdeploymentofsimpleusecasessuchaschatbotorFAQassistant:AlightweightconversationalAIusedforcustomerservice,HRinquiries,orbasicIThelpdesksupport.Handlesshort,structuredQ&Awithminimalcontextretention.
YEAR2
POC/lightweightadoption
Scalingtoincludeknowledge-drivenusecasessuchasdocumentsummarizationand
knowledgesearch:Internalenterpriseassistantthatretrievesandsummarizespolicydocuments,proposals,orcontracts.Includessemanticsearchandmultiturnconversations.
YEAR3
Inferencingatscale
Maturingtodrivedecision-supportusecasessuchasananalyticsco-pilot:Assistsconsultants,analysts,orauditorsingeneratinginsights,draftingreports,orperformingdataanalysisacross
multipledatasources.Includesreasoning,structuredoutput,andintegrationwithenterprisesystems.
Source:Deloitteanalysis
14
Thepivottotokenomics:NavigatingAI,snewspenddynamics
Navigatingtheeconomicsofanacceleratingtechnologyenvironment
TherapidpaceofAIhardwareadvancementhas
createdobsolescencecyclesthatfaroutpacetraditionaldepreciationschedules,withGPUgenerationsnow
refreshingrapidly.Forexample,recentmodelreleases
quicklyoutgrewthecapabilitiesofpreviouslyleading
GPUstounlockfeatures,whilelegacysupportforolder
hardwarediminishes.NewerGPUsthatswitchtoanannualreleasecyclefurtheracceleratestheserefreshdemands,challengingenterprisestocontinuallybalancethebenefitsoffasterupgradeswiththeriskoffallingbehind.
SuchrecentadvancesinGPUtechnologyhaveenabledAIapplicationsrequiringlargercontextlengths,suchasreasoningmodels,summarizingextensivetextcorpora,
andhigh-fidelitymultimodaltaskslikeanalyzinghour-longvideos.Theseusecases,includingagenticreasoning,
demandsubstantialGPUmemoryandthelatesthardwaretoaccuratelyprocesssuchcomplexorlarge-scaledata.
However,adoptionofmultimodality,andagenticreasoningattheenterpriselevelisinitsearlystages,andinferencetasksoftenrunwellonolderGPUsespeciallyformidsizemodels.
AstokenpricingforAImodelsdeclinesandtheeconomicsof“buildvs.buy”shiftrapidly,enterprisescannotrelyonstaticassumptionsandshoulddevelopforward-looking
infrastructurestrategies—carefullyplanningupgrades,assessingcosts,andensuringinvestmentsremainviableasthemarketstabilizesovertime.
15
Thepivottotokenomics:NavigatingAI,snewspenddynamics
16
Analysisoutcome
TheTCOsimulationincorporatedreal-worldparametersacrossthefullAIvaluechain—fromhardwareutilizationandenergycoststofacilitiesexpenses.Eachvariablewascalibratedtoreflectcurrentmarketconditionsandoperationalrealitiesratherthantheoreticalefficiency.
Thisapproachensuredaholisticviewofcostbehavior:howGPUutilizationrates,power
efficiency,andAImodelcomplexitycombinetoshapeeffectivecostpertoken.TheresultinganalysissurfacedtheunderlyingmechanicsofanewAIeconomy—onewheretechnicaldecisionsdirectlydictatefinancialoutcomes.
1.Usagescalingandcomplexitydriveshostingadvantage.
InourTCOmodeling,thefirstyearat10billiontokens,workloadsfavortheAPIaccess
approach—pay-as-you-goapproachesminimizeidlecapacitycosts.Asthenumberof
tokensrisesinyeartwo,theeconomicsflip.Athigherreasoningloadsmoretokensareconsumed,andself-hostedAIfactoriesoutperformAPIsasfixedinfrastructurecostsareabsorbedandutilizationincreases.Afterfouryears,thesimulationprojectedcumulativeTCOistwicethecostforAPIhostingasitwouldbeforanAIfactory,giventhesame
configurationandtokenscaling(figure4).
Figure4.Over3years,anAIfactoryis~2.1xmorecost-effectivethanAPI-basedsolutions
AIfactoryaverages~150%annualTCOgrowthvs.>1,000%(API)and>800%(NCP),ensuringmorestable,predictable,andmanageablecosts
AIfactorysees>90%dropin$/BtokensfromY1toY3
($24Kto$1.45K)vs.64%(API)and84%(NCP),becomingmostcost-efficientathighscale
ANNUALTOTALCOSTOFOWNERSHIP
AnnualTCO(USDinmillions)
4.0M
3.5M
3.0M
2.5M
2.0M
1.5M
1.0M
0.5M
0.0M
3.50M
Overa3-yearTCO,AIfactoryon-prem
2.72M
deliversmorethan50%costsavings
comparedtobothAPI-basedandNCPsolutions
1.45M
0.97M1.06M
0.49M
0.24M0.17M
Year1
Year2
Year3
10billiontokens
300billiontokens1
s1,000billiontokens(1trillion)
0.04M
AIfatloud(NCP)API
Source:Deloittesimulation
Pay-as-you-goAPIsandNCParemoresuitedtosimple,low-volumeworkloads,whileAIfactory(self-hosted)isco
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負責。
- 6. 下載文件中如有侵權(quán)或不適當內(nèi)容,請與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。
最新文檔
- 2025年華容縣招教考試備考題庫帶答案解析
- 2025年遼寧工程職業(yè)學院單招職業(yè)適應(yīng)性測試題庫帶答案解析
- 2026年2026吉林省長春市吉林大學白求恩第一醫(yī)院兒外科招聘筆試歷年典型考題(歷年真題考點)解題思路附帶答案詳解
- 2025年清苑縣幼兒園教師招教考試備考題庫帶答案解析
- 2025年廣西工程職業(yè)學院單招職業(yè)技能測試題庫附答案解析
- 2025年三門縣招教考試備考題庫含答案解析(必刷)
- 2025年太湖創(chuàng)意職業(yè)技術(shù)學院單招職業(yè)適應(yīng)性測試題庫帶答案解析
- 2025年天津美術(shù)學院馬克思主義基本原理概論期末考試模擬題及答案解析(必刷)
- 2025年江西省鷹潭市單招職業(yè)適應(yīng)性測試題庫附答案解析
- 2026年內(nèi)蒙古豐州職業(yè)學院單招職業(yè)適應(yīng)性考試模擬測試卷附答案解析
- 2025年醫(yī)療人工智能產(chǎn)業(yè)報告-蛋殼研究院
- 長沙股權(quán)激勵協(xié)議書
- 問卷星使用培訓
- 心源性腦卒中的防治課件
- 2025年黨員民主評議個人總結(jié)2篇
- 果園合伙經(jīng)營協(xié)議書
- 2026中國民營醫(yī)院集團化發(fā)展過程中的人才梯隊建設(shè)專題報告
- 物業(yè)管理經(jīng)理培訓課件
- 員工解除競業(yè)協(xié)議通知書
- 【語文】太原市小學一年級上冊期末試題(含答案)
- 儲能電站員工轉(zhuǎn)正述職報告
評論
0/150
提交評論