機(jī)器學(xué)習(xí)題庫

上傳人：無*** IP屬地：河北上傳時(shí)間：2026-01-03 格式：PDF 頁數(shù)：44 大小：25.93MB 積分：12 舉報(bào) 版權(quán)申訴

已閱讀5頁，還剩39頁未讀，繼續(xù)免費(fèi)閱讀

版權(quán)說明：本文檔由用戶提供并上傳，收益歸屬內(nèi)容提供方，若內(nèi)容存在侵權(quán)，請進(jìn)行舉報(bào)或認(rèn)領(lǐng)

文檔簡介

機(jī)器學(xué)習(xí)題庫

一、極大似然

I、MLestimationofexponentialmodel(10)

.Gaussia.distributio.i.ofte.use.t.niode.dat.o.th.rea.line.bu.i.sometinie.inappropriat.whe.th.dat.ar.oft

e.clos.t.zer.bu.constraine.t.b.nonnegative.I.bab

ilit.densit.functio.i.give.by

p(x)=*i

GivenNobservationsxidrawnfromsuchadistribution:

(a)Writedownthelikelihoodasafunctionofthescaleparameterb.

(b)Writedownthederivativeoftheloglikelihood.

(c)GiveasimpleexpressionfortheMLestimateforb.

(a)L(X?=W，

<=i°

(b)=I<)R(L(X:6))=-.Vk>R(fc)--.r,

<=1

(c)￡(x：g=()n，,=：f

(JbA

，二1

2.換成Poisson分布：

/⑹=之log(P(￡⑹)=2七loglog(x!)

J-lJ-l

=￡七loge-N8-￡log(xJ)

?EI

二、貝葉斯

1、貝葉斯公式應(yīng)用

2、假設(shè)在考試的多項(xiàng)選擇中，考生知道正確答案的概率為p,猜測答案的概率為1-p,并且

假設(shè)考生知道正確答案答對題的概率為1,猜中正確答案的概率為，其中m為多選項(xiàng)

的數(shù)目。那么已知考生答對題目，求他知道正確答案的概率。：

3、Conjugatepriors

Givenalikelihood°)foraclassmodelswithparameters,aconjugatepriorisa

distributionp(0\y\withhyperparametersy,suchthattheposteriordistribution

X,r)=ap(X\0)p(0\y)=p(0]/)

與先驗(yàn)的分布族相同

(a)SupposethatthelikelihoodisgivenbytheexponentialdistributionwithrateparameterX:

Showthatthegammadistribution

Gamma(Ma,p)=-

i..conjugat.pno.tb.th.cxponcntial.Dcriv.th.paramctc.updat.givc.obscrvation..an.(h.prcdictio.distributio.

(a)Ex]x>nentialandGamma

ThelikelihoodisP(XA)=fJ*LiAexp(—Ax,)andtlieprioris/XA|a.3)=gainnia(X\a.J)=

eXp(-3A).LetXdenotetii<*olnervationsX),...xjvandlet$Ndenotetheirsum.Then

〔heposterioris

伊JL.

|X)8exp(-^A)JJAoxp(-A.G)

=段K+NTexp(f(3+sN))

1(c)

8gamnui(A|n+Ar,3+*N).

Thereforetheparameterupdatean、asfollows:

a-a+N

P<—B+§N

Fortheprcxlictiondistributionwecomputethefollowingintegral:

P(J,N+1IJi?…，上N)=jP"N+IIA)/>(AI]1,...,上丹川人

=jAexp(—AJ'N+1)<7?>"(A|c+N.34-a^)(IX

(…陽尸+Nr(n|N)f/\|上\rQa.,5

=「/,g773---；------/a+加Aa+N,0+8N+J-JV+1)?A

Ha+N)(/>+sN+J-A-+i)J

(8+5N)a+NQ+N

(,3+§N+%+1)°+'1+S/V+￡JV+1

wherethepenultimatestepusesthestandardformula<i/3fortheexpectedvalueofagamma

distribution.

(b)Showthatthebetadistributionisaconjugatepriorforthegeometricdistribution

p(x=k\e)={\-O^0

babilit.o.head.o.eac.tos.i.

0.Deriv.th.paramete.updat.rul.an.predictio.distribution.

(b)GeometricandBeta

ThelikelihoodforasingleobservationofvaluekisP(X=Ar|^)=(1-0)k~l0andprioris

p(H|a.6)=Beta(a.b)=a8a~1(l—8/一:whereoistheiionnalizationconstant.Tlicnthe

posterioris.

p(01X)=卅一1(]一8尸(]一6尸6

=aea(l-0)b+k-2

=Beta(0\a+l.b^k-l)

Thereforetheparameterupdatesare.

?+l

I；b+k-1

Forthepn*<lictiondistributionwtcompute(lietbllowingintegral:

P(%2=￡I%=*)=fp(%2=，I|X1=k)M

=^(1-oy-^Betaid\a+l.b+k-l)de

r(a+6+A)r(a+l)r(6+A+f-2)J0Bcta(e|o+1.6+A+f-2)d6

r(a+l)r(6+Ar-l)V(a+b+k+t-l)

r(a+b+A)r(6+A-+r-2)?+1

r(6+A--1)r(?+6+A-+-1)?+6+A-+I

=r(a+6+A)Rb+A：+f—2)

「(b+--1)Ra+b+k+f)?“-

wherethepenultimatestepusesthestandardtbnnulaa/(a+3)fortheexpectedvalueofaBeta

distribution.

(c)Suppose〃(例,)isaconjugatepriorforthelikelih(x>d6)；showthatthemixtureprior

p(例％，…，八,)=ZM、P(0外,)

i.als.conjugat.fo.th.sam.likclihood.assumin.th.mixtur.wcight.w.su.t.i.

(c)MixturePrior

Thepriorisgivenbythemixture,

p(eI?,....)“)=￡”言「(&I.

m=l

Moreover,wcarcgiventliatP(6ym)isaconjugatepriorforthelikelihoodP(X6);inother

words,

?(6|X-m)=?mP(X|0)P(6|7m)=-?I7m).

Whenweinnlti])lythemixturepriorwiththelikelihood,wegetrliefollowingposterior:

P(0|X.7,..…7A/)=cP(X|6)EITmP。|%)

m=l

=.atz,mpxI8)P(0I7m)

=￡—P(eI7；.)

=ftr/WlYn)

Thereforeweobservethattheposteriorhasthesameformastlieprior,i.e..amixturedistribution

withupdatedweightsandhyperparainelens.

(d)Repeatpart(c)forthecasewherethepriorisasingledistributionandthelikelihoodisamixture,

andthepriorisconjugateforeachmixturecomponentofthelikelihood.

somepriorscanbeconjugateforseveraldifferentlikelihoods;torexample,thebetaisconjugatefortheBernoulli

andthegcomclricdistributionsandthegammaisconjugatefortheexponentialandforthegammawithfixeda

(c)(Extracredit,20)Explorethecasewherethelikelihoodisamixturewithfixedcomponentsand

unknownweights;i.e.,theweightsaretheparameterstobelearned.

Problem2

Considertlicprobabilitydensityfunction(ormasbfunction,ifXisdiscrete)fortheexponentialfamily:

力=h(x)exp{r)Tu(x)一〃(?/)}.

(a)Showtliattheunivariatenormalandtheinultinoniialdistributionsbelongtotliisfamily.

(b)Showthat,inagenerativeckissificationin()(h'l.ittheclassconditionaldensitiesbelongtotheexponentia)

family,tlientheposteriordistributionforaclassisasoftmaxfunctionofalinearfunctionofthefeature

vectorx.

(c)Considering?/tobeascalar,findanexpi<*xsi()nfor出.(Wherewillthisexpressionbeluxxicd?)

(<l)(Forextracredit)AstatisticT(J)issaidtobesufficientforaparameterifp(.r|T(.r)://)=p(j|T(x)),

orinotherwords,itisin(lept*n<lentofSliowthatforarandomvariableXdrawnfromanexponentialfaniily

densityp(x;?/).u(x)isasufficientstatistictor//.(Showthatafactorizationp{x.M(X);7)=7/)

isnecessaryandsufficientforM(T)tobeasufficientstatisticfor川.

(e)(Forextracredit)SupposearedrawniidfromanexponentialfamilydensityWhat

.Xn〃(『：〃).

isnowthesufficientstatisticr(xj,...,xn)for;/?

三、判斷題

(1)給定n個(gè)數(shù)據(jù)點(diǎn)，如果其中一半用于訓(xùn)練，另一半用于測試，則訓(xùn)練誤差和測試誤差之間的

差別會(huì)隨著n的增加而減小。

(2)極大似然估計(jì)是無偏估計(jì)且在所有的無偏估計(jì)中方差最小，所以極大似然估計(jì)的風(fēng)險(xiǎn)最小。

(3)回歸函數(shù)A和B,如果A比B更簡單，則A幾乎一定會(huì)比B在測試集上表現(xiàn)更好。

(4)全局線性回歸需要利用全部樣本點(diǎn)來預(yù)測新輸入的對應(yīng)輸出值，而局部線性回歸只需利用

查詢點(diǎn)附近的樣本來預(yù)測輸出值。所以全局線性回歸比局部線性回歸計(jì)算代價(jià)更高。

(5)Boosting和Bagging都是組合多個(gè)分類器投票的方法，二者都是根據(jù)單個(gè)分類器的正確率決

定其權(quán)重。

(6)Intheboostingiterations,thetrainingerrorofeachnewdecisionstumpandthetrainingerrorof

thecombinedclassifiervaryroughlyinconcert(F)

Whilethetrainingerrorofthecombinedclassifiertypicallydecreasesasafunctionofboosting

iterations,theerroroflheindividualdecisionstumpstypicallyincreasessincetheexampleweighs

becomeconcentratedatthemostdifficultexamples.

(7.On.advantag.o.Boostin.i.iha.i.doc.no.ovcrfit.(F)

(8.Supp()r.vecto.machine.ar.resistan.t.outliers.i.e..ver.nois.example.draw.fro..differen.distribution.

(F)

(9)在回歸分析中，最佳子集選擇可以做特征選擇，當(dāng)特征數(shù)目較多時(shí)計(jì)算量大;嶺回歸和Lasso

模型計(jì)算量小，且Lasso也可以實(shí)現(xiàn)特征選擇。

(10)當(dāng)訓(xùn)練數(shù)據(jù)較少時(shí)更容易發(fā)生過擬合。

(11)梯度下降有時(shí)會(huì)陷于局部極小值，但EM算法不會(huì)。

(12)在核回歸中，最影響回歸的過擬合性和欠擬合之間平衡的參數(shù)為核函數(shù)的寬度。

(13.I.th.AdaBoos.algorithm.th.weight.o.al.th.misclassifie.point.wil.g.u.b.th.sam.multiplicativ.factor.(T)

xp(-6%=exp(a)

7.[2points]true/falseInAdaBoost,weightedtrainingerror€tofthetweakclassilier

ontrainingdatawithweightsDttendstoincreaseasafunctionoft.

★SOLUTION:True.Inthecourseofboostingiterationstheweakclassifiersare

forcedtotrytoclassifymoredifficultexamples.Theweightswillincreaseforexamples

thatarerepeatedlymisclassifiedbytheweakclassifiers.Theweightedtrainingerrorof

thetthweakclassifieronthetrainingdatathereforetendstoincrease.

9.2points]Considerapointthatiscorrectlyclassifiedanddistantfromthedecision

)()tin(laiy.WhywouldSVM's(l(*cisi()nboundarybeunafTech'dbythispoint,butthe

onelearnedbylogisticregressionbeaffected?

★SOLUTION:ThehingelossusedbySVMsgiveszeroweighttothesepointswhile

thelog-lossusedbylogisticregressiongivesalittlebitofweighttothesepoints.

(14.True/False.L.blem.addin.a.Lregularizati().penalt.cann().decreas.th.L.

erro.o.th.solutio.w.o.th.trainin.data..(F)

(15)True/False:Inaleast-squareslinearregressionproblem,addinganL?regularizationpenalty

alwaysdecreasestheexpectedL2errorofthesolutionwAonunseentestdata(F).

(16)除了EM算法，梯度下降也可求混合高斯模型的參數(shù)。(T)

(20)Anydecisionboundarythatwegetfromagenerativemodelwith

class-conditionalGaussiandistributionscouldinprinciplebereproducedwithan

SVMandapolynomialkernel.

True!Infact,sinceclass-conditionalGaussiansalwaysyieldquadraticdecision

boundaries,theycanbereproducedwithanSVMwithkernelofdegreelessthanor

equaltotwo.

(21)AdaBoostwilleventuallyreachzerotrainingerror,regardlessofthetypeofweak

classifierituses,providedenoughweakclassifiershavebeencombined.

False!Ifthedataisnotseparablebyalinearcombinationoftheweakclassifiers,

AdaBoostcan'tachievezerotrainingerror.

(22.Th.L.penalt.i..ridg.regressio.i.equivalen.t..Laplac.prio.o.th.weights.(F)

(23.Th.log-likelihoo.o.th.dat.wil.alway.increas.throug.successiv.iteration.o.th.expectatio.niaxi

matio.algorithm.(F)

(24.1.trainin..logisti.regressio.mode.b.maximizin.th.likelihoo.o.th.label.give.th.input,w

.hav.multipl.locall.optima.solutions.(F)

四、回歸

I.考慮回歸一個(gè)正則化回歸問題。在卜圖中給出了懲罰函數(shù)為二次正則函數(shù)，當(dāng)正則化參數(shù)C取

不同值時(shí)，在訓(xùn)練集和測試集上的log似然(meanlog-probability)。(10分)

（1）說法”隨著C的增加，圖2中訓(xùn)練集上的log似然永遠(yuǎn)不會(huì)增加”是否正確，并說明理由。

（2）解釋當(dāng)C取較大值時(shí)，圖2中測試集上的log似然下降的原因。

2.考慮線性回歸模型：，訓(xùn)練數(shù)據(jù)如下圖所示。（10分）

（1）用極大似然估計(jì)參數(shù)，并在圖（a）中畫出模型。（3分）

（2）用正則化的極大似然估計(jì)參數(shù)，即在log似然目標(biāo)函數(shù)中加入正則懲罰函數(shù)，

并在圖（b）中畫出當(dāng)參數(shù)C取很大值時(shí)的模型。（3分）

（3）在正則化后，高斯分布的方差是變大了、變小了還是不變？（4分）

3.5

2.5

0.5

-0.5

圖(a)圖(b)

3.考慮二維輸入空間點(diǎn)上的回歸問題，其中在單位正方形內(nèi)。訓(xùn)練樣本和測試樣本在單位正

方形中均勻分布，輸出模型為，我們用1-10階多項(xiàng)式特征，采用線性回歸模型來學(xué)習(xí)x與y之

間的關(guān)系（高階特征模型包含所有低階特征），損失函數(shù)取平方誤差損失。

（1）現(xiàn)在個(gè)樣本上，訓(xùn)訓(xùn)練誤差最小訓(xùn)練誤差最大測試誤差最小

練1階、2階、8階和10

階特征的模型，然后在

一個(gè)大規(guī)模的獨(dú)立的測

試集上測試，則在下3列

中選擇合適的模型（可能

有多個(gè)選項(xiàng)），并解釋第

3列中你選擇的模型為

什么測試誤差小。（10

分）

1階特征的線性模型X

2階特征的線性模型X

8階特征的線性模型X

10階特征的線性模型X

（2）現(xiàn)在個(gè)樣本上，訓(xùn)訓(xùn)練誤差最小訓(xùn)練誤差最大測試誤差最小

練1階、2階、8階和1（）

階特征的模型，然后在

一個(gè)大規(guī)模的獨(dú)立的測

試集上測試，則在下3列

中選擇合適的模型（可能

有多個(gè)選項(xiàng)），并解釋第

3列中你選擇的模型為

什么測試誤差小。（10

分）

1階特征的線性模型X

2階特征的線性模型

8階特征的線性模型XX

10階特征的線性模型X

(3.Tii.approximatio.erro.o..polynoniia.regressio.niode.dcpcnd.oth.numbc.o.trainin.points.(T)

(4.Th.stiuctura.erro.o..polynomia.regressk).mode.depend.o.th.numbe.o.trainin.points.(F)

4.W.ar.tryin.t.lear.regressio.parameter.fo..datase.whic.w.kno.wa.generate.fro..polynomia.o..certai.degre

e.bu.w.d.no.kno.wha.thi.degre.is.Assuni.th.dat.wa.actuall.generate.fro..polynomia.o.degre..wit.som.add

c.Gaussia.nois.(tha.i..

Fo.trainin.w.hav.l0.{x,y.pair.an.fo.testin.w.ar.usin.a.additiona.se.o.l0.{x,y.pairs.Sinc.

w.d.no.kno.th.degre.o.th.polynomia.w.lear.tw.model.fro.th.data.Modc..learn.paramete

r.fo..polynomia.o.degre..an.mode..learn.parameter.fo..polynomia.o.degre.6.Whic.o.the

s.tw.modeLi.like!.t.fi.lh.tes.dat.better?

Answer.Degre..polynomial.Sinc.th.mode.i..degre..polynomia.an.w.hav.enoug.trai

nin.data.th.mode.w.lear.fo..si.degre.polynomia.wil.likel.fi..ver.smal.coefficien.fo.x

..Thus.eve.thoug.i.i..si.degre.polynomia.i.wil.actuall.behav.i..ver.simila.wa.t..fift.d

egre.polynoniia.whic.i.th.correc.mode.leadin.t.bette.fi.t.th.data.

5.Inputdependentnoiseinregression

a)Ordinar.least-square.regressio.i.equivalen.t.assuniin.tha.eac.dat.poin.i.generate.accordin.t..linea.fun

ctio.o.ih.inpu.plu.zero-mean.constant-varianc.Gaussia.noise.I.man.systems.however.th.nois.varian

c.i.itsel..posidv.linea.functio.o.(h.inpu.(whic.i.assume.t.b.non-nega(ive.i.e...>.0).

b)Whichofthefollowingfamiliesofprobabilitymodelscorrectlydescribesthissituationinthe

univariatecase?(Hint:onlyoneofthemdoes.)

Pp/W?x=布1?！瓁p，(—(!/一(『^<>5+—)

m?、1(。一("b+Wi+"2)G)2\

iii.

—左一)

(iii.i.correct.L.Gaussia.distributio.ove.y.th.varianc.i.determine.b.th.coefficien.o.y2.s.b.replacin.b.w.

gc..varianc.tha.incrcasc.lincarLwit.x.(Not.als.th.chang.t.th.nonnalizatio.

“constant.".(iha.quadrati.dependenc.o.x.(ii.doe.no.chang.th.varianc.a.all.i.jus.rename.wl.

c)CircletheplotsinFigure1thatcouldplausiblyhavebeengeneratedbysomeinstanceofthe

modelfamily(ies)youchose.

(ii.an.(iii).(Not.lha.(iii.work.fo...(i.exhibit..larg.varianc.a...0.an.th.varianc.appear.independen.o.x.

d)True/False:Regressionwithinput-dependentnoisegivesthesamesolutionasordinaryregression

foraninfinitedatasetgeneratedaccordingtotheconcspondingmodel.

Truc.I.bot.casc.th.algorith.wil.rccovc.th.tru.undcrlyin.model.

e)Forthemodelyouchoseinpart(a),writedownthederivativeofthenegativeloglikelihoodwith

respecttowi.

Thenegativeloglikelihoodis

andthederivativew.r.t.u'iis

dLN

-(?o+?

Notethatforlinesthroughtheorigin(“?()=()),theoptimalsolitionhastheparticularlysimpleform

=W五

Itispossibletotakethoderivativeofthologwithoutnoticingthatlogoxp(j-)=.r:wcuseloglikelihoods

forag(xxlreason!Plus,theysimplifythehandlingofmultipledatapoints,lx*causetheproducto：

probabilitiesIMTOIBCSasumoflogprobabilities.

五、分類

產(chǎn)生式模.VS.判別式模型

(a.You.billionair.frien.need.you.help.S.good/ba.ca

tegories.an.als.t.detec.jo.applicant.wh.li.i.thei.application.usin.densit.estimatio.t.de

tec.outliers.T.mee.thes.needs.d.yo.recommen.usin..discriminativ.o.generativ.classi

fier.Why.

產(chǎn)生式模型

因?yàn)橐烙?jì)密度〃(x|y)

(b.You.biHionair.frien.als.wanl.t.classif.soflwar.叩plication.l.detec.bug-pron.appli

cation.usin.feature.o.th.sourc.code.Tjec.onl.ha..fe.apphcation.t.b.use.a.tr

ainin.data.though.T.creat.th.mos.accurat.classifier.d.yo.recommen.usin..discrimina

tiv.o.generativ.classifier.Why?

判別式模型

樣本數(shù)較少，通常用判別式模型直接分類效果會(huì)好些

Fpanie.t.dccid.whic.on.t.acquire.T

jec.ha.lot.o.trainin.dat.base.o.severa.decade.o.research.T.creat.th.mos.accura

t.classifier.d.yo.recommen.usin..discriminativ.o.generativ.classifier.Why?

產(chǎn)生式模型

樣本數(shù)很多時(shí)，可以學(xué)習(xí)到正確的產(chǎn)生式模型

2.1ogstic回歸

o-0.2

d」

-0.4

00.511.522.533.54

regularizationparameterC

Figure2:Log-probabilityoflabelsasafunctionofregularizationparameterC

1、Hblem.I.Figur.2.w.hav.plotte.th.mea.lo

g-probabilit.o.labcli.th.trainin.an.tes.sct.aftc.havin.trainc.th.classific.wit.quadrati.rcgularizati

o.penalt.an.differen.value.o.th.regularizatio.paramete.C.

I.trainin..logisti.regressio.mode.b.maximizin.th.likelihoo.o.th.label.give.th.input.w.hav.multipl.locall.op

tima.solutions.(F)

2、Answer.Th.log-prohabilit.o.label.give.example.implie.b.th.logisti.regressio.mode.i..concav.(c

onve.down.funclio.wit.respec.t.th.weights.Th.(only.locall.optima.solutio.i.als.globalLoplimal

.stochasti.gradien.algorith.fo.trainin.logisti.regressio.modeLwit..fixe.leaniin.rat.wil.f'in.th.optima.settin

.o.th.weight.exactly.(F)

3、Answer..fixe.learnin.rat.meanJha.w.ar.alway.takin..finit.sie.toward.iinprovin.th.log-probabili

t.o.an.sini>Lirainin.exciini)Li.tli.updat.equation.Unles.th.example.ar.someho.

“aligned".w.wil.contimi.jumjjin.fro.sid.t.sid.o.th.optima.soliition.an.wil.no.b.abl.t.ge.arbitr

aril,clos.t.it.Th.learnin.rat.ha.t.approach.zer.i.th.cours.o.ih.update.fo.th.weight.t.converge.

Th.averag.log-probabilit.o.trainin.label.a.i.Figur..ca.neve.increas.a.w.increas.C.(T)

Strongerregularizationmeansmoreconstraintsonthesolutionandthusthe(average)log-probability

ofthetrainingexamplescanonlyf>etworse.

Explai.wh.i.Figur..th.tes.log-probabilit.o.label.decrease.fo.larg.value.o.C.

A..increases.w.giv.mor.weif?h.t.consirainin.th.preclicior.an.thu.giv.les.Jlexibilii.i.Jitlin.th.lrainin.sei

.Th.increase,regularizaiio.guarantee,tha.th.tes.peiformanc.get.close,t.th.trahun.performance.bu.a

.w.over-consirai.ou.allowe.predictors.w.ar.no.abl.t.fi.th.irainin.se.a.alLan.althoug.ih.tes.performa

nc.i.no.ver.clos.t.th.trainin.peij'onnance.hot.ar.low.

Th.log-probabilit.o.labcLi.th.tes.se.wouLdccrcas.tb.larg.vakic.o.evc.i.w.ha.Jarg.niirnbc.o.trainin.exanip

les.(T)

Theabovear^urnenlstillholds,butthevalueofCforwhichwewillobservesuchadecreasewillscale

upwiththenumberofexamples.

4、Addingaquadraticregularizationpenaltyfortheparameterswhenestimatingalogistic

regressionmodelensuresthatsomeoftheparameters(weightsassociatedwiththe

componentsoftheinputvectors)vanish.

.regulurizatio.penalt.fo.featiir.seleciio.mus.hawnon-zer.derivativ.a.zero.Oihenvise.th.regularizaiio.ha.n

.effec.a.zero.an.weigh.wiLten.t.b.slightl.non-zero.eve.whe.thi.doe.no.improv.th.log-probabilitie.b.iauch.

3.正則化的Logstic回歸

ThisproblemwewillrefertothebinaryclassificationtaskdepictedinFigure1(a).whichweattemptto

solvewiththesimplelinearlogisticregressionmodel

P(y=l|x,tt-1,U'2)=+"2￡2)=-----；--------------；

1+exp(一皿叫—?'2^2)

(fb.siniplicit.w.d.no.us.th.bia.parainctc.wO).Th.trainin.dat.ca.b.separatc.wit.zcr.trainin.crro..sc.lin.L.i.Fi

gur.l(b.R).instance.

(a)The2-dimensionaldatasetusedin(b)ThepointscanbeseparatedbyL\

Problem2(solidline).Possibleotherdecision

boundariesareshownbyL2;L3；LA.

(1)Consideraregularizationapproachwherewetrytomaximize

￡logp(%|x”U'I.M-2)-—^2

i-l

fo.larg.C.Not.tha.onl.w.i.penalized.We'.lik.t.kno.whic.o.th.fbu.line.i.Figur.l(b.coul.aris.a..resul.o.suc.

regularization.Fo.eac.potentia.lin.L2.L.o.L.determin.whethe.i.ca.resul.fro.regularizin.w2.I.not.explai.ve

r.briefl.wh.not.

L2.No.Whe.vv.re^ulariz.w2.th.resullin.boundar.ca.reI.les.o.th.valu.o.x.an.therefor,become,mor.vertical.

I..her.seem.t.b.n)or.horizonta.a..resul.o.penali2in.w2

L3.Yes.Her.vv2A.i.smal.relativ.I.wlA.(a.evidence.b.hi^.slope).an.eve.thoug.i.woiil.eissig..rathe.Io.log-pro

habilit.t.th.obsei^e.labels,i.coul.b.force,b..larg.regularizatio.paramete.C.

(2)L4.No.Fo.ver.larg.C.w.ge..boundar.iha.i.eniirel.veriica.axis).L.her.i.reflecte.acros.th.

x.axi.an.represent..poore.solutio.tha.it'.counte.par.o.th.othe.side.Fo.moderat.regularizatio.w.hav

.t.ge.th.bes.solutio.tha.w,ca.construc.whil.keepin.w.sniall.Li.no.th.bes.an.thu.canno.corn.a.,resid.o.

regularizin.w2.

(3)Ifwechangetheformofregularizationtoone-norm(absolutevalue)andalsoregularizewlv/eget

thefollowingpenalizedlog-likelihood

J、c

￡logp?i|xi,奶，—)--(|wi|+|w2|).

i=l

Cble.i.Figiir.l(a.an.th.sam.linea.logisii.regressio.modeLA.w.increas.th.regularizaio.p

aramete..whic.o.th.followin.scenario.d.yo.expec.t.obsen,.(choos.onl.one):

(x)Firstw\willbecome0,:henvv2.

()wlandw2willbecomezerosimultaneously

()Firstw2willbecome0,thenw\.

()Noneoftheweightswillbecomeexactlyzero,onlysmallerasCincreases

Th.dat.ca.b.classifie.wit.zer.irainin.eno.an.iherefor.als.wii.hig.'og-probabilit.b.lookin.a.th.valu.o.x.alo

ne.i.e.makin.w..O.hutiall.w.migh.prefe.t.ha\,..n(m-zer.valu.f().w.bu.i.wil.g.t.zer.rathe.quickl.a.w.increas.r

egularizaiion.Not.tha.w.pa..regularizatio.penalt.fi)..non-zer.valu.o.w.cin.i.i.doesn'.heI.classificatio.wh.

woul.w.pa.th.penalty.Th.absclut.vain,regularizcitio.ensure,tha.vv.wil.indee.g.t.exact!,zero.A..increase.fu

rther.eve.w.wil.eventuall.becom.zero.\V.pa.highe.an.highe.cosJo.settin.w.t..non-zer.value.Eventuall.thi.

cos.ovenvhelm.ih.gai.fro.th.lof>■probahilit.o.label.Iha.w.ca.achiev.wit..non-zer.w2.Nol.tha.whe.w..卬..0.1

h.log-probabilit.o.label.i..finit.valu.nlog(0:5).

1、SVM

Figure4:Trainingset,maximummarginlinearseparator,andthesupportvectors(inbold).

(1)Whatistheleave-one-outcross-validationerrorestimateR:rmaximummarginseparationinfigure

4?(wcarcaskingforanumber)(0)

(2)Base.o.th.figiir.w.ca.se.tha.removin.an.singLpoin.woul.no.chanc.th.resultin.nuiximu.margi.separat

or.Sinc.al.th.point.ar.iniiialLclassifie.correctly.th.leave-onc-ou.erro.i.zero.

W.woiil.expec.th.suppor.vector.t.reinai.th.sam.i.genera.a.w.mov.fro..linea.kerne.t.highe.orde.polynomia

.kcmcls.(F)

(3)Ther.ar.n.giuirantee.tha.th.suppor.vector.reinai.th.sanie.Th.featur.vecior.correspondin.t.polynomia

.kernel.ar.non-linea.function.o.th.ohgina.inpu.vector.an.thii.th.suppor.point.fo.maxinuLmargisepa

ratio,i.th.feaiur.spac.ca.b.quit.different.

Structura.ris.niininiizatio.i.guaranteed.fin.th.mode.(amon.thos.a)nsidered.wit.th.lowes.expecte.loss.(F)

Weareguaranteedtofindonlythemodelwiththelowestupperboundontheexpectedloss.

(4)WhatistheVC-dimensionofamixtureoftwoGaussiansmodelintheplanewithequalcovariance

matrices?Why?

.mixtur.o.tw.Gaiissian.\vil.equa.covananc.matrice.ha..linea.decisio.boundary.Linea.separator.i.th.plan.

hav.VC-di.exactl.3.

4.SVM

對如下數(shù)據(jù)點(diǎn)進(jìn)行分類：

class￡1①2

4-11

十22

十20

一00

(a.Plo.thes.si.trainin.points.?Xr.th.classe.{+.-linearl.separable?

yes

(b)Constructtheweightvectorofthemaximummarginhypcrpiancbyinspectionandidentifythe

supportvectors.

Themaximummarginhypcrplaneshouldhaveaslopeof-1andshouldsatisfyxi=3/2,xz=0.

Thereforeit'sequationisxi4-X2=3/2,andtheweightvectoris(l,1)T.

(c)Ifyouremoveoneofthesupportvectorsdocsthesizeoftheoptimalmargindecrease,staythe

same,orincrease?

Inthisspecificdatasettheoptimalmarginincreaseswhenweremovethesupportvectors(1.0)or(1.1)

andstaysthesamewhenweremovetheothertwo.

(d)(ExtraCredit)Isyouranswerto(c)alsotrueforanydataset?Provideacounterexampleorgivea

shortproof.

Wblem.w.ge.a.optima.valu.whic.i.a.lcas.a.goo.th.

previou.one.I.i.becaus.th.se.o.candidate.satisfyin.th.origina.(larger.stronger.se.o.contraint.i..subse.o.th.c

andidatc.satisfyin.th.nc.(smalIcr.wcakcr.se.o.constraints.So.fo.th.wcakc.constraints.th.oldoptinia.solutio

.i.stil.aviiilabLan.thcr.ma.b.addition.solton.tha.ar.cvc.bcttcr.I.mathcmatica.form:

maxf(x)<max.

Finally.not.tha.i.SV.problem.w.ar.maxiniizin.th.margi.subjec.t.th.constraint.give.b.trainin.points.Whe.

w.dro.an.o.th.constraint.th.margi.ca.increas.o.sta.th.sam.dependin.o.th.dataset.I.blem.wit.real

isti.dataset.

人人文庫> 全部分類> 教育資料 > 輔導(dǎo)培訓(xùn)

溫馨提示

1. 本站所有資源如無特殊說明，都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
2. 本站的文檔不包含任何第三方提供的附件圖紙等，如果需要附件，請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
3. 本站RAR壓縮包中若帶圖紙，網(wǎng)頁內(nèi)容里面會(huì)有圖紙預(yù)覽，若沒有圖紙預(yù)覽就沒有圖紙。
4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
5. 人人文庫網(wǎng)僅提供信息存儲(chǔ)空間，僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理，對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯，并不能對任何下載內(nèi)容負(fù)責(zé)。
6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容，請與我們聯(lián)系，我們立即糾正。
7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

機(jī)器學(xué)習(xí)題庫

文檔簡介

溫馨提示

最新文檔

評論

機(jī)器學(xué)習(xí)題庫

文檔簡介

溫馨提示

最新文檔

評論

相關(guān)文檔