版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進(jìn)行舉報(bào)或認(rèn)領(lǐng)
文檔簡介
機(jī)器學(xué)習(xí)題庫
一、極大似然
I、MLestimationofexponentialmodel(10)
.Gaussia.distributio.i.ofte.use.t.niode.dat.o.th.rea.line.bu.i.sometinie.inappropriat.whe.th.dat.ar.oft
e.clos.t.zer.bu.constraine.t.b.nonnegative.I.bab
ilit.densit.functio.i.give.by
p(x)=*i
GivenNobservationsxidrawnfromsuchadistribution:
(a)Writedownthelikelihoodasafunctionofthescaleparameterb.
(b)Writedownthederivativeoftheloglikelihood.
(c)GiveasimpleexpressionfortheMLestimateforb.
(a)L(X?=W,
<=i°
.N
(b)=I<)R(L(X:6))=-.Vk>R(fc)--.r,
<=1
(c)£(x:g=()n,,=:f
(JbA
,二1
2.換成Poisson分布:
/⑹=之log(P(£⑹)=2七loglog(x!)
J-lJ-l
=£七loge-N8-£log(xJ)
?EI
二、貝葉斯
1、貝葉斯公式應(yīng)用
2、假設(shè)在考試的多項(xiàng)選擇中,考生知道正確答案的概率為p,猜測答案的概率為1-p,并且
假設(shè)考生知道正確答案答對題的概率為1,猜中正確答案的概率為,其中m為多選項(xiàng)
的數(shù)目。那么已知考生答對題目,求他知道正確答案的概率。:
3、Conjugatepriors
Givenalikelihood°)foraclassmodelswithparameters,aconjugatepriorisa
distributionp(0\y\withhyperparametersy,suchthattheposteriordistribution
X,r)=ap(X\0)p(0\y)=p(0]/)
與先驗(yàn)的分布族相同
(a)SupposethatthelikelihoodisgivenbytheexponentialdistributionwithrateparameterX:
Showthatthegammadistribution
Gamma(Ma,p)=-
i..conjugat.pno.tb.th.cxponcntial.Dcriv.th.paramctc.updat.givc.obscrvation..an.(h.prcdictio.distributio.
(a)Ex]x>nentialandGamma
ThelikelihoodisP(XA)=fJ*LiAexp(—Ax,)andtlieprioris/XA|a.3)=gainnia(X\a.J)=
eXp(-3A).LetXdenotetii<*olnervationsX),...xjvandlet$Ndenotetheirsum.Then
〔heposterioris
伊JL.
|X)8exp(-^A)JJAoxp(-A.G)
=段K+NTexp(f(3+sN))
1(c)
8gamnui(A|n+Ar,3+*N).
Thereforetheparameterupdatean、asfollows:
a-a+N
P<—B+§N
Fortheprcxlictiondistributionwecomputethefollowingintegral:
*
P(J,N+1IJi?…,上N)=jP"N+IIA)/>(AI]1,...,上丹川人
=jAexp(—AJ'N+1)<7?>"(A|c+N.34-a^)(IX
(…陽尸+Nr(n|N)f/\|上\rQa.,5
=「/,g773---;------/a+加Aa+N,0+8N+J-JV+1)?A
Ha+N)(/>+sN+J-A-+i)J
(8+5N)a+NQ+N
(,3+§N+%+1)°+'1+S/V+£JV+1
wherethepenultimatestepusesthestandardformula<i/3fortheexpectedvalueofagamma
distribution.
(b)Showthatthebetadistributionisaconjugatepriorforthegeometricdistribution
p(x=k\e)={\-O^0
babilit.o.head.o.eac.tos.i.
0.Deriv.th.paramete.updat.rul.an.predictio.distribution.
(b)GeometricandBeta
ThelikelihoodforasingleobservationofvaluekisP(X=Ar|^)=(1-0)k~l0andprioris
p(H|a.6)=Beta(a.b)=a8a~1(l—8/一:whereoistheiionnalizationconstant.Tlicnthe
posterioris.
p(01X)=卅一1(]一8尸(]一6尸6
=aea(l-0)b+k-2
=Beta(0\a+l.b^k-l)
Thereforetheparameterupdatesare.
?+l
I;b+k-1
Forthepn*<lictiondistributionwtcompute(lietbllowingintegral:
P(%2=£I%=*)=fp(%2=,I|X1=k)M
=^(1-oy-^Betaid\a+l.b+k-l)de
r(a+6+A)r(a+l)r(6+A+f-2)J0Bcta(e|o+1.6+A+f-2)d6
r(a+l)r(6+Ar-l)V(a+b+k+t-l)
r(a+b+A)r(6+A-+r-2)?+1
r(6+A--1)r(?+6+A-+-1)?+6+A-+I
=r(a+6+A)Rb+A:+f—2)
「(b+--1)Ra+b+k+f)?“-
wherethepenultimatestepusesthestandardtbnnulaa/(a+3)fortheexpectedvalueofaBeta
distribution.
(c)Suppose〃(例,)isaconjugatepriorforthelikelih(x>d6);showthatthemixtureprior
M
p(例%,…,八,)=ZM、P(0外,)
EI
i.als.conjugat.fo.th.sam.likclihood.assumin.th.mixtur.wcight.w.su.t.i.
(c)MixturePrior
Thepriorisgivenbythemixture,
M
p(eI?,....)“)=£”言「(&I.
m=l
Moreover,wcarcgiventliatP(6ym)isaconjugatepriorforthelikelihoodP(X6);inother
words,
?(6|X-m)=?mP(X|0)P(6|7m)=-?I7m).
Whenweinnlti])lythemixturepriorwiththelikelihood,wegetrliefollowingposterior:
M
P(0|X.7,..…7A/)=cP(X|6)EITmP。|%)
m=l
=.atz,mpxI8)P(0I7m)
=£—P(eI7;.)
Cm
=ftr/WlYn)
Thereforeweobservethattheposteriorhasthesameformastlieprior,i.e..amixturedistribution
withupdatedweightsandhyperparainelens.
(d)Repeatpart(c)forthecasewherethepriorisasingledistributionandthelikelihoodisamixture,
andthepriorisconjugateforeachmixturecomponentofthelikelihood.
somepriorscanbeconjugateforseveraldifferentlikelihoods;torexample,thebetaisconjugatefortheBernoulli
andthegcomclricdistributionsandthegammaisconjugatefortheexponentialandforthegammawithfixeda
(c)(Extracredit,20)Explorethecasewherethelikelihoodisamixturewithfixedcomponentsand
unknownweights;i.e.,theweightsaretheparameterstobelearned.
Problem2
Considertlicprobabilitydensityfunction(ormasbfunction,ifXisdiscrete)fortheexponentialfamily:
力=h(x)exp{r)Tu(x)一〃(?/)}.
(a)Showtliattheunivariatenormalandtheinultinoniialdistributionsbelongtotliisfamily.
(b)Showthat,inagenerativeckissificationin()(h'l.ittheclassconditionaldensitiesbelongtotheexponentia)
family,tlientheposteriordistributionforaclassisasoftmaxfunctionofalinearfunctionofthefeature
vectorx.
(c)Considering?/tobeascalar,findanexpi<*xsi()nfor出.(Wherewillthisexpressionbeluxxicd?)
(<l)(Forextracredit)AstatisticT(J)issaidtobesufficientforaparameterifp(.r|T(.r)://)=p(j|T(x)),
orinotherwords,itisin(lept*n<lentofSliowthatforarandomvariableXdrawnfromanexponentialfaniily
densityp(x;?/).u(x)isasufficientstatistictor//.(Showthatafactorizationp{x.M(X);7)=7/)
isnecessaryandsufficientforM(T)tobeasufficientstatisticfor川.
(e)(Forextracredit)SupposearedrawniidfromanexponentialfamilydensityWhat
.Xn〃(『:〃).
isnowthesufficientstatisticr(xj,...,xn)for;/?
三、判斷題
(1)給定n個(gè)數(shù)據(jù)點(diǎn),如果其中一半用于訓(xùn)練,另一半用于測試,則訓(xùn)練誤差和測試誤差之間的
差別會(huì)隨著n的增加而減小。
(2)極大似然估計(jì)是無偏估計(jì)且在所有的無偏估計(jì)中方差最小,所以極大似然估計(jì)的風(fēng)險(xiǎn)最小。
(3)回歸函數(shù)A和B,如果A比B更簡單,則A幾乎一定會(huì)比B在測試集上表現(xiàn)更好。
(4)全局線性回歸需要利用全部樣本點(diǎn)來預(yù)測新輸入的對應(yīng)輸出值,而局部線性回歸只需利用
查詢點(diǎn)附近的樣本來預(yù)測輸出值。所以全局線性回歸比局部線性回歸計(jì)算代價(jià)更高。
(5)Boosting和Bagging都是組合多個(gè)分類器投票的方法,二者都是根據(jù)單個(gè)分類器的正確率決
定其權(quán)重。
(6)Intheboostingiterations,thetrainingerrorofeachnewdecisionstumpandthetrainingerrorof
thecombinedclassifiervaryroughlyinconcert(F)
Whilethetrainingerrorofthecombinedclassifiertypicallydecreasesasafunctionofboosting
iterations,theerroroflheindividualdecisionstumpstypicallyincreasessincetheexampleweighs
becomeconcentratedatthemostdifficultexamples.
(7.On.advantag.o.Boostin.i.iha.i.doc.no.ovcrfit.(F)
(8.Supp()r.vecto.machine.ar.resistan.t.outliers.i.e..ver.nois.example.draw.fro..differen.distribution.
(F)
(9)在回歸分析中,最佳子集選擇可以做特征選擇,當(dāng)特征數(shù)目較多時(shí)計(jì)算量大;嶺回歸和Lasso
模型計(jì)算量小,且Lasso也可以實(shí)現(xiàn)特征選擇。
(10)當(dāng)訓(xùn)練數(shù)據(jù)較少時(shí)更容易發(fā)生過擬合。
(11)梯度下降有時(shí)會(huì)陷于局部極小值,但EM算法不會(huì)。
(12)在核回歸中,最影響回歸的過擬合性和欠擬合之間平衡的參數(shù)為核函數(shù)的寬度。
(13.I.th.AdaBoos.algorithm.th.weight.o.al.th.misclassifie.point.wil.g.u.b.th.sam.multiplicativ.factor.(T)
xp(-6%=exp(a)
Ut
7.[2points]true/falseInAdaBoost,weightedtrainingerror€tofthetweakclassilier
ontrainingdatawithweightsDttendstoincreaseasafunctionoft.
★SOLUTION:True.Inthecourseofboostingiterationstheweakclassifiersare
forcedtotrytoclassifymoredifficultexamples.Theweightswillincreaseforexamples
thatarerepeatedlymisclassifiedbytheweakclassifiers.Theweightedtrainingerrorof
thetthweakclassifieronthetrainingdatathereforetendstoincrease.
9.2points]Considerapointthatiscorrectlyclassifiedanddistantfromthedecision
)()tin(laiy.WhywouldSVM's(l(*cisi()nboundarybeunafTech'dbythispoint,butthe
onelearnedbylogisticregressionbeaffected?
★SOLUTION:ThehingelossusedbySVMsgiveszeroweighttothesepointswhile
thelog-lossusedbylogisticregressiongivesalittlebitofweighttothesepoints.
(14.True/False.L.blem.addin.a.Lregularizati().penalt.cann().decreas.th.L.
erro.o.th.solutio.w.o.th.trainin.data..(F)
(15)True/False:Inaleast-squareslinearregressionproblem,addinganL?regularizationpenalty
alwaysdecreasestheexpectedL2errorofthesolutionwAonunseentestdata(F).
(16)除了EM算法,梯度下降也可求混合高斯模型的參數(shù)。(T)
(20)Anydecisionboundarythatwegetfromagenerativemodelwith
class-conditionalGaussiandistributionscouldinprinciplebereproducedwithan
SVMandapolynomialkernel.
True!Infact,sinceclass-conditionalGaussiansalwaysyieldquadraticdecision
boundaries,theycanbereproducedwithanSVMwithkernelofdegreelessthanor
equaltotwo.
(21)AdaBoostwilleventuallyreachzerotrainingerror,regardlessofthetypeofweak
classifierituses,providedenoughweakclassifiershavebeencombined.
False!Ifthedataisnotseparablebyalinearcombinationoftheweakclassifiers,
AdaBoostcan'tachievezerotrainingerror.
(22.Th.L.penalt.i..ridg.regressio.i.equivalen.t..Laplac.prio.o.th.weights.(F)
(23.Th.log-likelihoo.o.th.dat.wil.alway.increas.throug.successiv.iteration.o.th.expectatio.niaxi
matio.algorithm.(F)
(24.1.trainin..logisti.regressio.mode.b.maximizin.th.likelihoo.o.th.label.give.th.input,w
.hav.multipl.locall.optima.solutions.(F)
四、回歸
I.考慮回歸一個(gè)正則化回歸問題。在卜圖中給出了懲罰函數(shù)為二次正則函數(shù),當(dāng)正則化參數(shù)C取
不同值時(shí),在訓(xùn)練集和測試集上的log似然(meanlog-probability)。(10分)
(1)說法”隨著C的增加,圖2中訓(xùn)練集上的log似然永遠(yuǎn)不會(huì)增加”是否正確,并說明理由。
(2)解釋當(dāng)C取較大值時(shí),圖2中測試集上的log似然下降的原因。
2.考慮線性回歸模型:,訓(xùn)練數(shù)據(jù)如下圖所示。(10分)
(1)用極大似然估計(jì)參數(shù),并在圖(a)中畫出模型。(3分)
(2)用正則化的極大似然估計(jì)參數(shù),即在log似然目標(biāo)函數(shù)中加入正則懲罰函數(shù),
并在圖(b)中畫出當(dāng)參數(shù)C取很大值時(shí)的模型。(3分)
(3)在正則化后,高斯分布的方差是變大了、變小了還是不變?(4分)
3.5
3
2.5
2
a1
0.5
0
-0.5
圖(a)圖(b)
3.考慮二維輸入空間點(diǎn)上的回歸問題,其中在單位正方形內(nèi)。訓(xùn)練樣本和測試樣本在單位正
方形中均勻分布,輸出模型為,我們用1-10階多項(xiàng)式特征,采用線性回歸模型來學(xué)習(xí)x與y之
間的關(guān)系(高階特征模型包含所有低階特征),損失函數(shù)取平方誤差損失。
(1)現(xiàn)在個(gè)樣本上,訓(xùn)訓(xùn)練誤差最小訓(xùn)練誤差最大測試誤差最小
練1階、2階、8階和10
階特征的模型,然后在
一個(gè)大規(guī)模的獨(dú)立的測
試集上測試,則在下3列
中選擇合適的模型(可能
有多個(gè)選項(xiàng)),并解釋第
3列中你選擇的模型為
什么測試誤差小。(10
分)
1階特征的線性模型X
2階特征的線性模型X
8階特征的線性模型X
10階特征的線性模型X
(2)現(xiàn)在個(gè)樣本上,訓(xùn)訓(xùn)練誤差最小訓(xùn)練誤差最大測試誤差最小
練1階、2階、8階和1()
階特征的模型,然后在
一個(gè)大規(guī)模的獨(dú)立的測
試集上測試,則在下3列
中選擇合適的模型(可能
有多個(gè)選項(xiàng)),并解釋第
3列中你選擇的模型為
什么測試誤差小。(10
分)
1階特征的線性模型X
2階特征的線性模型
8階特征的線性模型XX
10階特征的線性模型X
(3.Tii.approximatio.erro.o..polynoniia.regressio.niode.dcpcnd.oth.numbc.o.trainin.points.(T)
(4.Th.stiuctura.erro.o..polynomia.regressk).mode.depend.o.th.numbe.o.trainin.points.(F)
4.W.ar.tryin.t.lear.regressio.parameter.fo..datase.whic.w.kno.wa.generate.fro..polynomia.o..certai.degre
e.bu.w.d.no.kno.wha.thi.degre.is.Assuni.th.dat.wa.actuall.generate.fro..polynomia.o.degre..wit.som.add
c.Gaussia.nois.(tha.i..
Fo.trainin.w.hav.l0.{x,y.pair.an.fo.testin.w.ar.usin.a.additiona.se.o.l0.{x,y.pairs.Sinc.
w.d.no.kno.th.degre.o.th.polynomia.w.lear.tw.model.fro.th.data.Modc..learn.paramete
r.fo..polynomia.o.degre..an.mode..learn.parameter.fo..polynomia.o.degre.6.Whic.o.the
s.tw.modeLi.like!.t.fi.lh.tes.dat.better?
Answer.Degre..polynomial.Sinc.th.mode.i..degre..polynomia.an.w.hav.enoug.trai
nin.data.th.mode.w.lear.fo..si.degre.polynomia.wil.likel.fi..ver.smal.coefficien.fo.x
..Thus.eve.thoug.i.i..si.degre.polynomia.i.wil.actuall.behav.i..ver.simila.wa.t..fift.d
egre.polynoniia.whic.i.th.correc.mode.leadin.t.bette.fi.t.th.data.
5.Inputdependentnoiseinregression
a)Ordinar.least-square.regressio.i.equivalen.t.assuniin.tha.eac.dat.poin.i.generate.accordin.t..linea.fun
ctio.o.ih.inpu.plu.zero-mean.constant-varianc.Gaussia.noise.I.man.systems.however.th.nois.varian
c.i.itsel..posidv.linea.functio.o.(h.inpu.(whic.i.assume.t.b.non-nega(ive.i.e...>.0).
b)Whichofthefollowingfamiliesofprobabilitymodelscorrectlydescribesthissituationinthe
univariatecase?(Hint:onlyoneofthemdoes.)
Pp/W?x=布1?!瓁p,(—(!/一(『^<>5+—)
m?、1(。一("b+Wi+"2)G)2\
iii.
—左一)
(iii.i.correct.L.Gaussia.distributio.ove.y.th.varianc.i.determine.b.th.coefficien.o.y2.s.b.replacin.b.w.
gc..varianc.tha.incrcasc.lincarLwit.x.(Not.als.th.chang.t.th.nonnalizatio.
“constant.".(iha.quadrati.dependenc.o.x.(ii.doe.no.chang.th.varianc.a.all.i.jus.rename.wl.
c)CircletheplotsinFigure1thatcouldplausiblyhavebeengeneratedbysomeinstanceofthe
modelfamily(ies)youchose.
(ii.an.(iii).(Not.lha.(iii.work.fo...(i.exhibit..larg.varianc.a...0.an.th.varianc.appear.independen.o.x.
d)True/False:Regressionwithinput-dependentnoisegivesthesamesolutionasordinaryregression
foraninfinitedatasetgeneratedaccordingtotheconcspondingmodel.
Truc.I.bot.casc.th.algorith.wil.rccovc.th.tru.undcrlyin.model.
e)Forthemodelyouchoseinpart(a),writedownthederivativeofthenegativeloglikelihoodwith
respecttowi.
Thenegativeloglikelihoodis
andthederivativew.r.t.u'iis
dLN
-(?o+?
Notethatforlinesthroughtheorigin(“?()=()),theoptimalsolitionhastheparticularlysimpleform
=W五
Itispossibletotakethoderivativeofthologwithoutnoticingthatlogoxp(j-)=.r:wcuseloglikelihoods
forag(xxlreason!Plus,theysimplifythehandlingofmultipledatapoints,lx*causetheproducto:
probabilitiesIMTOIBCSasumoflogprobabilities.
五、分類
產(chǎn)生式模.VS.判別式模型
(a.You.billionair.frien.need.you.help.S.good/ba.ca
tegories.an.als.t.detec.jo.applicant.wh.li.i.thei.application.usin.densit.estimatio.t.de
tec.outliers.T.mee.thes.needs.d.yo.recommen.usin..discriminativ.o.generativ.classi
fier.Why.
產(chǎn)生式模型
因?yàn)橐烙?jì)密度〃(x|y)
(b.You.biHionair.frien.als.wanl.t.classif.soflwar.叩plication.l.detec.bug-pron.appli
cation.usin.feature.o.th.sourc.code.Tjec.onl.ha..fe.apphcation.t.b.use.a.tr
ainin.data.though.T.creat.th.mos.accurat.classifier.d.yo.recommen.usin..discrimina
tiv.o.generativ.classifier.Why?
判別式模型
樣本數(shù)較少,通常用判別式模型直接分類效果會(huì)好些
Fpanie.t.dccid.whic.on.t.acquire.T
jec.ha.lot.o.trainin.dat.base.o.severa.decade.o.research.T.creat.th.mos.accura
t.classifier.d.yo.recommen.usin..discriminativ.o.generativ.classifier.Why?
產(chǎn)生式模型
樣本數(shù)很多時(shí),可以學(xué)習(xí)到正確的產(chǎn)生式模型
2.1ogstic回歸
0
M
=
q
E
q
o-0.2
d」
l
6
o
_
-0.4
00.511.522.533.54
regularizationparameterC
Figure2:Log-probabilityoflabelsasafunctionofregularizationparameterC
1、Hblem.I.Figur.2.w.hav.plotte.th.mea.lo
g-probabilit.o.labcli.th.trainin.an.tes.sct.aftc.havin.trainc.th.classific.wit.quadrati.rcgularizati
o.penalt.an.differen.value.o.th.regularizatio.paramete.C.
I.trainin..logisti.regressio.mode.b.maximizin.th.likelihoo.o.th.label.give.th.input.w.hav.multipl.locall.op
tima.solutions.(F)
2、Answer.Th.log-prohabilit.o.label.give.example.implie.b.th.logisti.regressio.mode.i..concav.(c
onve.down.funclio.wit.respec.t.th.weights.Th.(only.locall.optima.solutio.i.als.globalLoplimal
.stochasti.gradien.algorith.fo.trainin.logisti.regressio.modeLwit..fixe.leaniin.rat.wil.f'in.th.optima.settin
.o.th.weight.exactly.(F)
3、Answer..fixe.learnin.rat.meanJha.w.ar.alway.takin..finit.sie.toward.iinprovin.th.log-probabili
t.o.an.sini>Lirainin.exciini)Li.tli.updat.equation.Unles.th.example.ar.someho.
“aligned".w.wil.contimi.jumjjin.fro.sid.t.sid.o.th.optima.soliition.an.wil.no.b.abl.t.ge.arbitr
aril,clos.t.it.Th.learnin.rat.ha.t.approach.zer.i.th.cours.o.ih.update.fo.th.weight.t.converge.
Th.averag.log-probabilit.o.trainin.label.a.i.Figur..ca.neve.increas.a.w.increas.C.(T)
Strongerregularizationmeansmoreconstraintsonthesolutionandthusthe(average)log-probability
ofthetrainingexamplescanonlyf>etworse.
Explai.wh.i.Figur..th.tes.log-probabilit.o.label.decrease.fo.larg.value.o.C.
A..increases.w.giv.mor.weif?h.t.consirainin.th.preclicior.an.thu.giv.les.Jlexibilii.i.Jitlin.th.lrainin.sei
.Th.increase,regularizaiio.guarantee,tha.th.tes.peiformanc.get.close,t.th.trahun.performance.bu.a
.w.over-consirai.ou.allowe.predictors.w.ar.no.abl.t.fi.th.irainin.se.a.alLan.althoug.ih.tes.performa
nc.i.no.ver.clos.t.th.trainin.peij'onnance.hot.ar.low.
Th.log-probabilit.o.labcLi.th.tes.se.wouLdccrcas.tb.larg.vakic.o.evc.i.w.ha.Jarg.niirnbc.o.trainin.exanip
les.(T)
Theabovear^urnenlstillholds,butthevalueofCforwhichwewillobservesuchadecreasewillscale
upwiththenumberofexamples.
4、Addingaquadraticregularizationpenaltyfortheparameterswhenestimatingalogistic
regressionmodelensuresthatsomeoftheparameters(weightsassociatedwiththe
componentsoftheinputvectors)vanish.
.regulurizatio.penalt.fo.featiir.seleciio.mus.hawnon-zer.derivativ.a.zero.Oihenvise.th.regularizaiio.ha.n
.effec.a.zero.an.weigh.wiLten.t.b.slightl.non-zero.eve.whe.thi.doe.no.improv.th.log-probabilitie.b.iauch.
3.正則化的Logstic回歸
ThisproblemwewillrefertothebinaryclassificationtaskdepictedinFigure1(a).whichweattemptto
solvewiththesimplelinearlogisticregressionmodel
.1
P(y=l|x,tt-1,U'2)=+"2£2)=-----;--------------;
1+exp(一皿叫—?'2^2)
(fb.siniplicit.w.d.no.us.th.bia.parainctc.wO).Th.trainin.dat.ca.b.separatc.wit.zcr.trainin.crro..sc.lin.L.i.Fi
gur.l(b.R).instance.
(a)The2-dimensionaldatasetusedin(b)ThepointscanbeseparatedbyL\
Problem2(solidline).Possibleotherdecision
boundariesareshownbyL2;L3;LA.
(1)Consideraregularizationapproachwherewetrytomaximize
£logp(%|x”U'I.M-2)-—^2
i-l
fo.larg.C.Not.tha.onl.w.i.penalized.We'.lik.t.kno.whic.o.th.fbu.line.i.Figur.l(b.coul.aris.a..resul.o.suc.
regularization.Fo.eac.potentia.lin.L2.L.o.L.determin.whethe.i.ca.resul.fro.regularizin.w2.I.not.explai.ve
r.briefl.wh.not.
L2.No.Whe.vv.re^ulariz.w2.th.resullin.boundar.ca.reI.les.o.th.valu.o.x.an.therefor,become,mor.vertical.
I..her.seem.t.b.n)or.horizonta.a..resul.o.penali2in.w2
L3.Yes.Her.vv2A.i.smal.relativ.I.wlA.(a.evidence.b.hi^.slope).an.eve.thoug.i.woiil.eissig..rathe.Io.log-pro
habilit.t.th.obsei^e.labels,i.coul.b.force,b..larg.regularizatio.paramete.C.
(2)L4.No.Fo.ver.larg.C.w.ge..boundar.iha.i.eniirel.veriica.axis).L.her.i.reflecte.acros.th.
x.axi.an.represent..poore.solutio.tha.it'.counte.par.o.th.othe.side.Fo.moderat.regularizatio.w.hav
.t.ge.th.bes.solutio.tha.w,ca.construc.whil.keepin.w.sniall.Li.no.th.bes.an.thu.canno.corn.a.,resid.o.
regularizin.w2.
(3)Ifwechangetheformofregularizationtoone-norm(absolutevalue)andalsoregularizewlv/eget
thefollowingpenalizedlog-likelihood
J、c
£logp?i|xi,奶,—)--(|wi|+|w2|).
i=l
Cble.i.Figiir.l(a.an.th.sam.linea.logisii.regressio.modeLA.w.increas.th.regularizaio.p
aramete..whic.o.th.followin.scenario.d.yo.expec.t.obsen,.(choos.onl.one):
(x)Firstw\willbecome0,:henvv2.
()wlandw2willbecomezerosimultaneously
()Firstw2willbecome0,thenw\.
()Noneoftheweightswillbecomeexactlyzero,onlysmallerasCincreases
Th.dat.ca.b.classifie.wit.zer.irainin.eno.an.iherefor.als.wii.hig.'og-probabilit.b.lookin.a.th.valu.o.x.alo
ne.i.e.makin.w..O.hutiall.w.migh.prefe.t.ha\,..n(m-zer.valu.f().w.bu.i.wil.g.t.zer.rathe.quickl.a.w.increas.r
egularizaiion.Not.tha.w.pa..regularizatio.penalt.fi)..non-zer.valu.o.w.cin.i.i.doesn'.heI.classificatio.wh.
woul.w.pa.th.penalty.Th.absclut.vain,regularizcitio.ensure,tha.vv.wil.indee.g.t.exact!,zero.A..increase.fu
rther.eve.w.wil.eventuall.becom.zero.\V.pa.highe.an.highe.cosJo.settin.w.t..non-zer.value.Eventuall.thi.
cos.ovenvhelm.ih.gai.fro.th.lof>■probahilit.o.label.Iha.w.ca.achiev.wit..non-zer.w2.Nol.tha.whe.w..卬..0.1
h.log-probabilit.o.label.i..finit.valu.nlog(0:5).
1、SVM
Figure4:Trainingset,maximummarginlinearseparator,andthesupportvectors(inbold).
(1)Whatistheleave-one-outcross-validationerrorestimateR:rmaximummarginseparationinfigure
4?(wcarcaskingforanumber)(0)
(2)Base.o.th.figiir.w.ca.se.tha.removin.an.singLpoin.woul.no.chanc.th.resultin.nuiximu.margi.separat
or.Sinc.al.th.point.ar.iniiialLclassifie.correctly.th.leave-onc-ou.erro.i.zero.
W.woiil.expec.th.suppor.vector.t.reinai.th.sam.i.genera.a.w.mov.fro..linea.kerne.t.highe.orde.polynomia
.kcmcls.(F)
(3)Ther.ar.n.giuirantee.tha.th.suppor.vector.reinai.th.sanie.Th.featur.vecior.correspondin.t.polynomia
.kernel.ar.non-linea.function.o.th.ohgina.inpu.vector.an.thii.th.suppor.point.fo.maxinuLmargisepa
ratio,i.th.feaiur.spac.ca.b.quit.different.
Structura.ris.niininiizatio.i.guaranteed.fin.th.mode.(amon.thos.a)nsidered.wit.th.lowes.expecte.loss.(F)
Weareguaranteedtofindonlythemodelwiththelowestupperboundontheexpectedloss.
(4)WhatistheVC-dimensionofamixtureoftwoGaussiansmodelintheplanewithequalcovariance
matrices?Why?
.mixtur.o.tw.Gaiissian.\vil.equa.covananc.matrice.ha..linea.decisio.boundary.Linea.separator.i.th.plan.
hav.VC-di.exactl.3.
4.SVM
對如下數(shù)據(jù)點(diǎn)進(jìn)行分類:
class£1①2
4-11
十22
十20
一00
10
01
(a.Plo.thes.si.trainin.points.?Xr.th.classe.{+.-linearl.separable?
yes
(b)Constructtheweightvectorofthemaximummarginhypcrpiancbyinspectionandidentifythe
supportvectors.
Themaximummarginhypcrplaneshouldhaveaslopeof-1andshouldsatisfyxi=3/2,xz=0.
Thereforeit'sequationisxi4-X2=3/2,andtheweightvectoris(l,1)T.
(c)Ifyouremoveoneofthesupportvectorsdocsthesizeoftheoptimalmargindecrease,staythe
same,orincrease?
Inthisspecificdatasettheoptimalmarginincreaseswhenweremovethesupportvectors(1.0)or(1.1)
andstaysthesamewhenweremovetheothertwo.
(d)(ExtraCredit)Isyouranswerto(c)alsotrueforanydataset?Provideacounterexampleorgivea
shortproof.
Wblem.w.ge.a.optima.valu.whic.i.a.lcas.a.goo.th.
previou.one.I.i.becaus.th.se.o.candidate.satisfyin.th.origina.(larger.stronger.se.o.contraint.i..subse.o.th.c
andidatc.satisfyin.th.nc.(smalIcr.wcakcr.se.o.constraints.So.fo.th.wcakc.constraints.th.oldoptinia.solutio
.i.stil.aviiilabLan.thcr.ma.b.addition.solton.tha.ar.cvc.bcttcr.I.mathcmatica.form:
maxf(x)<max.
Finally.not.tha.i.SV.problem.w.ar.maxiniizin.th.margi.subjec.t.th.constraint.give.b.trainin.points.Whe.
w.dro.an.o.th.constraint.th.margi.ca.increas.o.sta.th.sam.dependin.o.th.dataset.I.blem.wit.real
isti.dataset.
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫網(wǎng)僅提供信息存儲(chǔ)空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負(fù)責(zé)。
- 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。
最新文檔
- 2025年大學(xué)中醫(yī)康復(fù)技術(shù)(中醫(yī)康復(fù)基礎(chǔ))試題及答案
- 2025年高職食品營養(yǎng)與檢測(食品營養(yǎng)成分分析)試題及答案
- 2025年中職第二學(xué)年(烹飪工藝與營養(yǎng))湯羹制作工藝試題及答案
- 禁毒宣傳培訓(xùn)課件
- 國內(nèi)頂尖AI實(shí)驗(yàn)室巡禮
- 團(tuán)隊(duì)伙伴介紹話術(shù)
- 2026廣西壯族自治區(qū)山口紅樹林生態(tài)國家級自然保護(hù)區(qū)管理中心招聘1人備考題庫及完整答案詳解
- 2025-2026學(xué)年北京市石景山區(qū)高三上學(xué)期期末英語試題
- 2026廣東佛山順德區(qū)龍江中學(xué)招聘臨聘教師備考題庫及答案詳解(奪冠系列)
- 2026浙江溫州市平陽縣海大海洋產(chǎn)業(yè)創(chuàng)新研究院招聘3人備考題庫附答案詳解
- 統(tǒng)編版語文一年級上冊無紙化考評-趣味樂考 玩轉(zhuǎn)語文 課件
- 2025年新水利安全員b證考試試題及答案
- 高壓氧進(jìn)修課件
- 2025無人機(jī)物流配送網(wǎng)絡(luò)建設(shè)與運(yùn)營效率提升研究報(bào)告
- 鋁錠采購正規(guī)合同范本
- 城市更新能源高效利用方案
- 2025 精神護(hù)理人員職業(yè)倦怠預(yù)防課件
- 春播行動(dòng)中藥貼敷培訓(xùn)
- 水泵維修安全知識(shí)培訓(xùn)課件
- 木材采伐安全生產(chǎn)培訓(xùn)課件
- DB1301∕T492-2023 電動(dòng)車停放充電消防安全技術(shù)規(guī)范
評論
0/150
提交評論