版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進行舉報或認領(lǐng)
文檔簡介
ResearchGate
Seediscussions,stats,andauthorprofilesforthispublicationat:
/publication/391707103
Bridgingchemistryandarti?cialintelligencebyareactiondescription
language
ArticleinNatureMachineIntelligence·May2025DOI:10.1038/s42256-025-01032-8
CITATIONS0
READS13
12authors,including:
JiachengXiong
ShanghaiInstituteofMateriaMedica,ChineseAcademyofSciences20PUBLICATIONS439CITATIONS
SEEPROFILE
WeiZhang
ShanghaiInstituteofMateriaMedica
21PUBLICATIONS101CITATIONS
SEEPROFILE
FuZunyun
ShanghaiInstituteofMateriaMedica
38PUBLICATIONS822CITATIONS
SEEPROFILE
XiangtaiKong
ScienceforLifeLaboratory
20PUBLICATIONS92CITATIONS
SEEPROFILE
Allcontentfollowingthispagewasuploadedby
MingyueZheng
on16May2025.Theuserhasrequestedenhancementofthedownloadedfile.
Publishedonline:xxxxxxxx
naturemachineintelligence
Article
/10.1038/s42256-025-01032-8
Bridgingchemistryandartificialintelligencebyareactiondescriptionlanguage
Received:15May2024
Accepted:4April2025
Checkforupdates
JiachengXiong1,2,6,WeiZhang1,2,6,YinquanWang1,3,JiataoHuang4,YuqiShi1,2,MingyanXu1,2,ManjiaLi1,ZunyunFu1,XiangtaiKong1,2,YitianWang1,2,ZhaopingXiong
5&MingyueZheng1,2
Withthefast-paceddevelopmentofartificialintelligence,largelanguagemodelsareincreasinglyusedtotacklevariousscientificchallenges.
Acriticalstepinthisprocessisconvertingdomain-specificdataintoa
sequenceoftokensforlanguagemodelling.Inchemistry,moleculesareoftenrepresentedbymolecularlinearnotations,andchemicalreactionsaredepictedassequencepairsofreactantsandproducts.However,thisapproachdoesnotcaptureatomicandbondchangesduringreactions.Here,wepresentReactSeq,areactiondescriptionlanguagethatdefinesmoleculareditingoperationsforstep-by-stepchemicaltransformation.BasedonReactSeq,languagemodelsforretrosynthesispredictionmayconsistentlyexcelinallbenchmarktests,anddemonstratepromising
emergentabilitiesinthehuman-in-the-loopandexplainableartificial
intelligence.Moreover,ReactSeqhasallowedustoobtainuniversalandreliablerepresentationsofchemicalreactions,whichenablenavigationofthereactionspaceandaidintherecommendationofexperimentalproceduresandpredictionofreactionyields.WeforeseethatReactSeqcanserveasabridgetonarrowthegapbetweenchemistryandartificialintelligence.
Artificialintelligencetechnologies,representedbylargelanguagemodels(LMs),haveachievedunprecedentedbreakthroughsinnaturallanguageprocessing,influencingthemodelofscientificresearch
1
,
2
.Inthelifesciencesdomain,LMsarenowusedtominehiddeninforma-tionfromproteinandgenesequences,achievingremarkableresults.NotableexamplesincludeESM,whichinterpretstheproteinfunctionsandstructuresfromtheirsequences
3
,
4
,andGeneformer,whichpredictsgenefunctionsandinteractions
5
.Inchemistryandpharmaceuticals,theimportantconceptofchemicalLMs(CLMs),whichhandlechemicalmoleculesandreactions,hasalsoemerged
6
–
8
.
Unlikenaturallanguages,proteinsandgenes,chemicalmole-culeslackinherentsequentialrepresentations.CLMscapitalizeonchemist-definedmolecularlinearnotationstolearnandgenerate
molecularstructures.Themostcommonlyusedmolecularlinearnotationisthesimplifiedmolecularinputlineentrysystem(SMILES)
9
.Recently,toenhancetheperformanceofCLMsinspecifictasks,somenewmolecularlinearnotationsweredesigned.Forinstance,SELFIESwasdevelopedtohelpCLMsproducevalidmolecularstructures
10
,andPSMILESwasintroducedtofacilitatethelearningofpolymerrepresentations
11
.
However,thesemolecularlinearnotationsarealldesignedtodescribethestaticstructuresofchemicalmolecules.Theycannotexplicitlydescribethecrucialaspectofchemistry,namelytheprocessofatomandbondchangesinmoleculesduringachemicalreaction
12
.ThissignificantlyrestrictstheapplicationofLMsinchemicalreactionpredictionandrepresentation.CurrentLMsforchemicalreaction
1DrugDiscoveryandDesignCenter,StateKeyLaboratoryofDrugResearch,ShanghaiInstituteofMateriaMedica,ChineseAcademyofSciences,
Shanghai,China.2UniversityofChineseAcademyofSciences,Beijing,China.3DepartmentofMedicinalChemistry,SchoolofPharmacy,FudanUniversity,Shanghai,China.4SchoolofPhysicalScienceandTechnology,ShanghaiTechUniversity,Shanghai,China.5ProtonUnfoldTechnologyCo.Ltd,Suzhou,
China.6Theseauthorscontributedequally:JiachengXiong,WeiZhang.e-mail:
myzheng@
NatureMachineIntelligence
Article
/10.1038/s42256-025-01032-8
NatureMachineIntelligence
Previouslanguagemodelforreactionprediction(SMILEStoSMILES)
C1(C)=C2CCCCC2=NN1C1=CC(OC(C)C)=C(Cl)C=C1F
LM
Il
O
ClF
O
NH2
+
ONH
CC(C1CCCCC1=O)=O.
CC(OC1=CC(NN)=C(C=C1Cl)F)C
uLackofinteractivityuPoorinreactionrepresentationuLackofinterpretability
Ourproposedmethod(SMILEStoReactSeq)
F
F
22Attach
O
Break
79
2120
N
、N
6
5
1118Cl19
/
8N10
Cl
\廠16
/\
A
N
1
1213
O1514\
7Break
Reduce
O
314
O
LM
2
Attach
17
_!!<[O:1]><><[O:1]>
C1(C)_C2CCCCC2!NN!1C1=CC(OC(C)C)=C(Cl)C=C1F<[O:1]><><[O:1]>
4
3
7
5
6
10
O
1
2
8
O9
CC(C1CCCCC1=O)=O
C1(C)=C2CCCCC2=NN1C1=CC(OC(C)C)=C(Cl)C=C1F
Editingoperation
Molecule
Promptencoding
embedding
decoding
C1(C)=C2CCCCC2=NN1!C1=CC(OC(C)C)=C(Cl)C=C1F
uHuman-in-the-loopuBetterreactionrepresentationuExplainablereasoning
Fig.1|Overviewofthiswork.AcomparisonbetweenthepreviousLMforreactionpredictionbasedonSMILESandourproposedmethodbasedonReactSeq.
prediction,involvingforwardandretrosynthesisprediction,typi-callydirectlytranslatethelinearnotationsofproductsandreactantsintoeachother,whichhasbeenconsistentlycriticizedforlackinginterpretabilityandinteractivity(Fig.
1
,top)
13
–
15
.Recently,Wangetal.
16
andThakkaretal.
17
decomposedthetransformationfromproducttoreactantintotwosequentialsteps:first,translatingtheproductintosynthonsusingatransformer,andthentranslatingthesesynthonsintoreactantswithanothertransformer.Whilethistwo-stagedesignimprovesthemodel’sinterpretabilityandinteractivity,italsoincreasesthemodel’scomplexityandcompromisesitsend-to-endproperties.Additionally,duetothelimitationsofSMILESsyntax,thesemethodscanonlyindicatetheatomsinvolvedinthereactionwithoutdetailingtheirspecificchanges.Furthermore,whilepretrainedLMsexcelinrepresentationlearningofvarioussequencedata
18
,
19
,similaradvance-mentsforchemicalreactionsarenotablylacking.ExistingCLMscanlearnatomtransformationfromunmappedreactiondata
20
.However,generatingmeaningfulvectorrepresentationsforthistransformationprocessremainschallenging.Currentself-supervisedreactionrepre-sentationsstillstruggletoeffectivelycapturethesimilaritiesbetweendifferentreactions
21
.
Therefore,toadvancetheapplicationofLMsinchemistry,devel-opingnewlanguagesfordescribingchemicalreactionsisnecessary.Ideally,thislanguageshouldmakethereactionpredictionsmoreaccu-rateandinterpretable,enablingclarificationofthetransformationprocessofatomsandbonds.Thepredictionofthetransformationprocessshouldbecontrollable,allowingchemiststoguideLMswiththeirknowledge
22
.Moreover,thislanguageshouldenableLMstogen-eratebetterreactionrepresentationsfordiversedownstreamtasks.
Inthiswork,weintroduceareactiondescriptionlanguagenamedReactSeq,designedtomeettheaforementionedobjectives(Fig.
1
,bottom).Inspiredbyretrosynthesisprocess,ReactSeqdefinesboththeproductstructureandthemoleculareditingoperations(MEOs)
requiredtotransformitbackintoreactantmolecules.TheseMEOsincludethebreakingandchangingofchemicalbonds,alterationsinatomiccharges,andtheattachmentofleavinggroups(LGs),amongothers(Fig.
2a
).InaReactSeq-basedretrosynthesisLM,thereactantisnotgeneratedtoken-by-tokenfromscratch.Instead,itistransformedfromtheproductmoleculethroughtheseMEOs.Thisensurespreciseatommappingbetweenthepredictedreactantsandtheproducts,enhancingthemodel’sinterpretability.UsingReactSeq,avanillatrans-formercanachievestate-of-the-artperformanceinretrosynthesisprediction.Moreover,ReactSeqfeaturesexplicittokensdenotingMEOs,enablingtheencodingofhumaninstructions.Ourresultsshowthathumanexperts’promptscansignificantlyenhancethemodel’sperformanceandevenguideitinexploringnewreactions.Inaddition,theembeddingsofthoseMEOtokensprovideauniversalandreliablereactionrepresentation.Theseself-supervisedrepresentationscannaturallydistinguishbetweenreactiontypesandevaluatetheirsimi-larity,facilitatingsimilarreactionretrieval,experimentalprocedurerecommendationandreactionyieldprediction.
OverviewofReactSeq
ReactSeqconsistsoftwoparts:aheaderandatail(Fig.
2a
).Theheaderincludesthestructuraldetailsofatargetmoleculeandinformationonchangestoitsatomsandbonds,describinghowtotransformitintothecorrespondingsynthons.ThetailincludesthestructuresoftheLGsandtheirconnectionpositionswiththesynthons,describinghowtocompletethesynthonsintoreactants.InstandardSMILES,tokensfordoubleandtriplebondsarevisible,whiletokensforsinglebondsarehidden.However,thehiddentokenscanbespecifiedusingSMILESwithexplicitbonds(Fig.
2b
).ByreplacingthesebondtokensinSMILESwithMEOtokens(forexample,usinganexclamationmark‘!’torepre-sentbreakingabond),weobtaintheheaderofReactSeqthatrecordsthechangesandbreaksinchemicalbonds.Sometargetmoleculesin
Article
/10.1038/s42256-025-01032-8
a
C1(Cl)=NC=C(CBr)C(Cl)=C1
C1=C2CCC(O)C2=CC=C1F
C1=CC(OCC(=O)O)=CN=C1
CC(=O)C1=CC=CC=C1
HeaderofReactSeqTailofReactSeq
C1(Cl)=NC=C(C!Br)C(Cl)=C1<><C1(=O)CCC(=O)[N:1]1>
C1=C2CCC(;O)C2=CC=C1F
F
Changetodoublebond
OH
C1=CC(OCC(=O)[~OH])=CN=C1<[CH3:1]>
Atta8
Attachmentpoint
C[sC](_O)C1=CC=CC=C1<><>
O
CChangetosinglebondChangetoS-configuration
Breakbond
andconnecttoleavinggroup
Changebond
Directly
connectto
leavinggroup
Changebondandchangechirality
b
SMILES
C1(Cl)=NC=C(CBr)C(Cl)=C1
Cl
2
N
3
4
110
9
8
7Cl
5
6
Br
SMILESwith
explicitbonds
C1(-Cl)=N-C=C(-C-Br)-C(-Cl)=C-112345678910
Fig.2|IllustrationofReactSeq.a,SeveralrepresentativeexamplesofReactSeqdescribingMEOsinretrosynthesis,includingbondbreaking,bondchanging,
connectingtoLGsandchiralitychange.b,VisualizationofhiddentokensforsinglebondsinSMILES.
retrosynthesisdonotinvolvebreakingorchangingofbondsbetweenheavyatoms.Instead,theyaredirectlyconnectedtoLGs.Inthesecases,weinitiallyconverttheatomtokentotheexplicithydrogenmode,suchaschangingOto[OH],andthenaddacorrespondingMEOtoken(~)toit.Furthermore,changesinchirality,chargeandcis–transisomerismarealsodefinedinReactSeq.
ToobtainthetailofReactSeq,first,theatomsinthetargetmol-eculesthatcouldconnecttoLGsareidentified,knownasattachmentpoints.TheseincludeatomsdirectlyconnectedtoLGsorinvolvedinbondbreakingorreduction.TheLGsofeachattachmentpointareenclosedinanglebracketsandsortedbasedontheatomicindexesoftheirconnectedattachmentpoints.Followingthesesteps,astandardheader-to-tailReactSeqisobtained,maintaininghighalignmentwiththeSMILESofthetargetmolecule.ArelatedworktoReactSeqisthecondensedgraphofreaction(CGR),whichrepresentschemicalreac-tionsaspseudo-moleculesandcangeneratelinearnotationsforthesepseudo-molecules
23
.CGRdefinesoperationsforatomicandbondchangesbutfailstoprovideoperationsforcompletingLGs,whichrestrictsitscapacitytodepictunbalancedreactions.Additionally,thelinearrepresentationofCGRcannotensurealignmentwitheitherthereactantorproductSMILES.FurtherdetailsaboutReactSeqareprovidedintheMethodssection.
ReactSeqimprovesretrosynthesispredictionperformance
TodemonstratetheapplicationofReactSeq,wefirstuseditforret-rosynthesispredictionusingavanillatransformerwithoutanyaddi-tionalmodifications.Table
1
presentsacomprehensivecomparisonofourproposedmethodsandothermethodsontheUSPTO-50kdataset.
Ourmodeloutperformedallothers,regardlessofwhetherreactiontypesweregiven.
Ourmodelusesatwo-stageretrosynthesisreasoningstrategythatfirstidentifiesthereactioncentreandthencompletesthesyn-thons,correspondingtotheheaderandtailcomponentsofReactSeq,respectively.Whilemanygraphedit-basedmethods(forexample,G2Gs(ref.
24
),RetroXpert
25
,GraphRetro
26
)andthesequence-basedmethodRetroPrimealsousethisstrategy,theyoftenunderperformintopk(k≥3)accuracy.Thismightstemfromtheiruseofdifferentmodelsforeachstage’stasks,whichdisruptsthecontinuityoftheinformationflowbetweenthetwotasks,leadingtotheaccumu-lationoferrors.Incontrast,ReactSeq’sheader-to-tailstructureconsolidatesthedescriptionsoftworetrosynthesisstagesintoonesequence,enablingsequentialprocessingofthesetwotaskswithanend-to-endmodel,thusachievingstate-of-the-arttopkaccuracy.Furthermore,currentgraphedit-basedmethodssuchasGraph2Edits(ref.
27
),GraphRetro
26
andRetroExplainer
28
formulatesynthoncom-pletionasaclassificationproblem,selectingLGsfromapredefinedvocabulary.ThislimitstheirabilitytogeneratenewLGsandexplorenewreactions.Conversely,ourmodelgeneratesSMILESofLGstokenbytoken,allowingforthegenerationofnewLGsabsentfromthetrainingset,offeringgreaterflexibilitytodiscovernewchemicaltransformations.However,thisalsointroducestheriskofgenerat-ingchemicallyinfeasibleLGs,asdemonstratedinSupplementaryFig.1.WhilethreepredictedLGsappearchemicallyplausible,withanalogousreactionshavingbeenreported,thelastone—thedifluo-romethanesulfonicacidgroup—hasnotbeenobservedinexistingreactions.Thishighlightstheneedformorecarefulevaluationofnewpredictionsbeforeexecution.
Article
/10.1038/s42256-025-01032-8
NatureMachineIntelligence
Table1|TopkaccuracyofourproposedmethodandothermodelsonUSPTO-50kdataset
Class
Model
Topkaccuracy(%)
Reactionclassunknown
3510k=1
Reactionclassknown
k=1
3
5
10
Template-basedmethods
Retrosim
52
37.3
54.7
63.3
74.1
52.9
73.8
81.2
88.1
Neuralsym
53
44.4
65.3
72.4
78.9
55.3
76.0
81.4
85.1
GLN
54
52.5
69.0
75.6
83.7
64.2
79.1
85.2
90.0
LocalRetro
44
54.2
76.8
80.4
90.3
-
-
-
-
Graphedit-basedmethods
G2Gs(ref.
24
)
48.9
67.6
72.5
75.5
61.0
81.3
86.0
88.7
RetroXpert
25
50.4
61.1
62.3
63.4
62.1
75.8
78.5
80.9
MEGAN
55
48.1
70.7
78.4
86.1
60.7
82.0
87.5
91.6
GraphRetro
26
53.7
68.3
72.2
75.5
63.9
81.5
85.2
88.1
G2Retro(ref.
56
)
53.9
74.6
80.7
86.6
63.1
84.2
88.5
91.7
Graph2Edits(ref.
27
)
55.1
77.3
83.4
89.4
67.1
87.5
91.5
93.8
RetroExplainer
28
57.7
79.2
84.8
91.4
66.8
88.0
92.5
95.8
NAG2G(ref.
57
)
55.1
76.9
83.4
89.9
67.2
86.4
90.5
93.8
RetroCaptioner
58
54.3
76.3
82.6
88.1
67.2
86.0
90.3
93.4
Sequence-basedmethods
SCROP
59
43.7
60.0
65.2
68.7
59.0
74.8
78.1
81.1
Aug.Transformer
45
53.2
-
80.5
85.2
-
-
-
-
Dual-TF
60
53.6
70.7
74.6
77.0
65.7
81.9
84.7
85.9
BARTSmiles
50
55.6
-
74.2
80.9
-
-
-
-
RetroPrime
16
51.4
70.8
74.0
76.1
64.8
81.6
85.0
86.9
Chemformer
47
54.3
-
62.3
63.0
-
-
-
-
R-SMILES
46
56.3
79.2
86.2
91.0
-
-
-
-
ReactSeq
58.9
80.5
86.4
91.4
68.5
89.2
93.1
95.9
Note:thehyphenrepresentsthatthecorrespondingresultsarenotreportedandboldrepresentsthebestresult.TheresultsforLocalRetrocomefromtheauthor’smostrecentupdateonGitHub.
Wefurtheranalysedourmodel’sperformanceacrossvariousreac-tiontypes.Forrarereactiontypessuchascyclizationandfunctionalgrouptransformations,themodelmaintainedstrongperformance,achievingtoptenaccuraciesof84.6and91.3%,respectively(Supplemen-taryFig.2).However,theaccuracysignificantlydecreasedto65.6%forreactionsinvolvingstereochemicalchanges(SupplementaryTable1).Thisdiscrepancylikelystemsfromknowledgetransferability.Cyclizationandfunctionalgrouptransformationreactions,althoughunderrepre-sentedinourdataset,involveMEOssuchasbondbreakingandattachingLGs,whicharecommoninotherreactions.Thisallowsthemodeltoeffec-tivelytransferknowledgefrommorefrequentreactionstopredicttheserarertypes.Incontrast,stereochemicalchangesinvolveuniquerulesthatcannotbeinferredfromreactionswithoutsuchtransformations.
WealsoevaluatedourmethodonthelargerUSPTO-MITdataset,whereitdemonstratedsuperiorperformancewithaccuracyratesof60.5,78.5,83.3and87.6%forthetopone,three,fiveandtenpredictions,respectively(SupplementaryTable2).Theseresultshighlightourmethod’sapplicabilitytolarge-scaledatasets.Moreover,tovalidatethegeneralizabilityofourmodel,weconductedanexternalevalua-tionusingELN,areal-worldreactiondataset
29
.Ourmodelachievedstate-of-the-artperformanceonthisdataset(SupplementaryTable3),emphasizingitsrobustgeneralizationcapabilities.AblationstudiesaboutourReactSeqlanguageandtrainingstrategyareprovidedintheSupplementaryInformationC.1.
ReactSeqenablesinterpretableretrosynthesisprediction
Conventionalsequence-basedretrosynthesismethodsdirectlyconvertproductSMILESintoreactantSMILES,failingtodescribethespecific
transformationprocessfromproducttoreactants.ReactSeqaddressesthislimitationbydividingtheretrosynthesispredictionintotwophases:identifyingthereactioncentreandcompletingthesynthons.Thistwo-stagemoleculareditingapproachsimulatestheretrosyn-theticanalysisworkflow,aligningbetterwithhumanexpertintuitioncomparedtotheapproachinspiredbyspecificreactionmechanismsandofferinggreatergenerality
28
.
SupplementaryTable4showcasestheperformanceofourmodelandothermethodsinthetwostagesofretrosynthesis.Ourmodelachieved73.1%top-oneaccuracyinidentifyingreactioncentresand77.6%insynthoncompletion,significantlysurpassingpreviousmeth-ods.Amongthesetwo-stageretrosynthesismethods,RetroFormerisalsosequence-based.However,limitedbySMILESsyntax,themethodidentifiesonlyatomsinvolvedinbondbreakingwhenidentifyingthereactioncentre,failingtocapturethechangesofotheratomsandbonds.Duringthesynthoncompletionstage,RetroFormerdirectlytranslatessynthonSMILESintoreactantSMILES,notclarifyingtheprocessofattachingLGs.Somesequence-basedmethodsusetheatten-tionweightstoindicatereactioncentresandperformatomicmapping.However,theexplanationsprovidedbytheattentionmechanismdonotguaranteeconsistencywiththeactualtransformationsperformedbythemodel
30
,
31
.Incontrast,ReactSeqenablesLMstoaccuratelytrackthechangesofatomsandbondsthroughoutthereactionprocesswithoutanymodificationstothemodelarchitecture,offeringamorestream-linedandreliablesolutionforinterpretableretrosynthesisprediction.
Figure
3a
presentstheprocessofgeneratingReactSequsingbeamsearchbyourretrosynthesismodel,wherethetotalprobabilityistheproductoftheprobabilitiespredictedateachstep.Notably,thetotalpredictionprobabilityismainlyinfluencedbyMEOtokens,
Article
/10.1038/s42256-025-01032-8
NatureMachineIntelligence
Reactioncentreidentification
Synthoncompletion
C(=O)C1=CC=C(Cl)S1)N1C=CN=C1<>
Cl:1]>
OH:1]>
NH
P1=0.977
CC=C(Cl)O
Br:1]><>
NH
[O:1]>
P1=0.022
Rank-5
*Cl
P2=0.637
*OH
P2=0.358
*Cl
P2=0.485
*Br
P2=0.394
*O
P2=0.003
a
12
Cl11
10S1398
6O7
5HN
4
3
21
15N1416N18
17
Product
log(totalprobability)
0
?1
?2
?3
?4
?5
?6
?7
N!
CC(CC
<[
O
*
SCl
N
*
N
(=O)C1=
S
Cl
1)!N1C
C
S
=CN=C1<
[Cl:1]><>
Rank-1Rank-2Rank-3Rank-4
*
NN
*
C
0369121518212427303336394245
Steps
O
SCl
Cl
N
N
H2N
t
Rank-1predictionP=0.622
O
SCl
OH
N
N
H2N
t
Rank-2predictionP=0.350
O
O
SCl
SCl
Br
Cl
N
H
N
H
HN
HN
N
N
Rank-4predictionPt=0.009
t
Rank-3predictionP=0.011
O
SCl
O
N
N
H2N
t
Rank-5predictionP=0.002
b
600
500
Count
400
300
200
100
0
Retrosynthesisprediction
1,500
Correct
Incorrect
1,200
Count
900
600
300
0
1.00.80.60.40.20
Confidence
ReactioncentreidentificationCorrect
2,000
Incorrect
1,500
Count
1,000
500
0
1.00.80.60.40.20
Confidence
Synthoncompletion
Correct
Incorrect
1.00.80.60.40.20
Confidence
Fig.3|InterpretableretrosynthesispredictionwithReactSeq.a,Presentation
oftheinferenceprocessesofourmethod.P1andP2refertothepredictedprobabilitiesforreactioncentreidentificationandsynthoncompletion,
respectively,whilePtreferstothetotalprobabilityforfinalpredictionresult.b,Therelationshipbetweenmodel’spredictiveconfidencemeasuredby
predictiveprobabilityanditsaccuracyacrossdifferenttasks.
whichrepresentthedynamictransformationofdecomposingprod-uctmoleculesandcompletingsynthons.Theremainingtokens,usedtodescribethestaticmolecularstructurearepredictedconsistentlystable.Incontrast,thetotalpredictionprobabilityoftheSMILES-basedmodelisinfluencedbymanytokens,suggestingamoreintricatedecision-makingprocess(SupplementaryFig.3).Furthermore,thepredictionprobabilitiesfortheheaderandtailtokensinReactSeqallowcalculationofmodel’sconfidenceinitspredictionsateachstage.Thereisacleartrendwheretheaccuracyofpredictionsimprovesasthemodel’sconfidenceincreases(Fig.
3b
).
ReactSeqenablesprompt-basedreactionprediction
Inretrosynthesisprediction,humanexpertsoftenhaveinsightsregard-ingthelocationandtypeofreactionthatshouldoccur.WebelievethatincorporatingthisexpertknowledgethroughpromptscanguideLMs
togeneratemoreaccuratepredictions.Akeychallengeofthisprocessislinguisticallyencodingthediversehumanprompts.Thakkaretal.achievedthisbytaggingatomsinSMILES
17
.However,theirpromptswererestrictedtoindicatingbond-breakingpositions,failingtoencodeothertypesofhumanprompt.Conversely,ReactSeqdefinestokensrep-resentingvariousMEOs,enablingamorecomprehensiveandrefinedencodingofhumanprompts.
Todemonstratethis,wetrainedapromptlearningmodelusingReactSeq.AsshowninFig.
4a
,themodeliscapableofprocessingvari-oustypesofmoleculareditingpromptencodedbyReactSeq,andisguidedtoperformspecificreactiontransformations.Wefurthertestedthisprompt-basedlearningstrategyontheUSPTO-50kdataset,achiev-ing96.6%top-oneaccuracyinidentifyingreactioncentresand74.9%top-oneaccuracyinpredictingfinalreactants,significantlyoutper-formingthemodelwithoutprompts(Fig.
4b
).Itiscrucialtopointouttheseresultswereobtainedusingfullyaccuratehumanprompts,which
Article
/10.1038/s42256-025-01032-8
NatureMachineIntelligence
a
N
N
WhatifI…
N
OH
F
N
Predictionwithprompt1
N
BrAttach
N
NBreak
F
OH
N
N
N
N
H
Br
OH
N
F
Q1:breakthisbond?
Prompt1:N#CC1=CC=C(C(C(O)CC2=CC=C(F)C=C2)!N2C=NC=N2)C=C1
Q2:changethisbondtodoublebond?
Prompt2:N#CC1=CC=C(C(C(;O)CC2=CC=C(F)C=C2)N2C=NC=N2)C=C1
Q3:changethisbondtosinglebond?
Prompt3:N_CC1=CC=C(C(C(O)CC2=CC=C(F)C=C2)N2C=NC=N2)C=C1
N
Predictionwithprompt2
\\Mg+
N
N
N··Attach
OH
NBreak
F
Changetodoublebond
N
N
+Mg
OF
N
Predictionwithprompt3
N
N
N
O
Attach
OH
F
N
Changetosinglebond
b
100
Accuracy(%)
80
60
40
20
0
SynthonpredictionReactantprediction
99.297.2
96.6
95.8
98.599.192.9
74.9
Top-1Top-3Top-5Top-10
c
NO
Prompt
Break
N\
N
O
Output
NCl/Br/INCl/Br
HO
BHO
+Mg
N
O
Suzukicoupling
Grignardreaction
NCl/Br/I
N
O/
N
O
N+
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預覽,若沒有圖紙預覽就沒有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負責。
- 6. 下載文件中如有侵權(quán)或不適當內(nèi)容,請與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。
最新文檔
- 2025年西安高新區(qū)興隆街道衛(wèi)生院招聘備考題庫參考答案詳解
- 2026云南昆明市第八中學教育集團昆明長城中學春季招聘4人備考題庫及完整答案詳解一套
- 2026廣東佛山市第二人民醫(yī)院招聘高層次人才(第一批)1人備考題庫有完整答案詳解
- 2026吉林長春汽車經(jīng)濟技術(shù)開發(fā)區(qū)招聘編制外輔助崗位人員69人備考題庫及1套參考答案詳解
- 2026山東事業(yè)單位統(tǒng)考臨沂職業(yè)學院2026年公開招聘教師和教輔人員備考題庫22人及完整答案詳解1套
- 2026河北雄安新區(qū)應急管理協(xié)會招聘1人備考題庫及一套答案詳解
- 進一步規(guī)范完善財務制度
- 餐飲行業(yè)店面財務制度
- 財務制度審批報銷流程
- 福建省工會財務制度
- 建筑防水工程技術(shù)規(guī)程DBJ-T 15-19-2020
- 矢量網(wǎng)絡分析儀校準規(guī)范
- 高考英語閱讀理解分類及方法課件
- 紹興金牡印染有限公司年產(chǎn)12500噸針織布、6800萬米梭織布高檔印染面料升級技改項目環(huán)境影響報告
- DHA乳狀液制備工藝優(yōu)化及氧化穩(wěn)定性的研究
- 2023年江蘇省五年制專轉(zhuǎn)本英語統(tǒng)考真題(試卷+答案)
- 岳麓書社版高中歷史必修三3.13《挑戰(zhàn)教皇的權(quán)威》課件(共28張PPT)
- GC/T 1201-2022國家物資儲備通用術(shù)語
- 污水管網(wǎng)監(jiān)理規(guī)劃
- GB/T 6730.65-2009鐵礦石全鐵含量的測定三氯化鈦還原重鉻酸鉀滴定法(常規(guī)方法)
- GB/T 35273-2020信息安全技術(shù)個人信息安全規(guī)范
評論
0/150
提交評論