版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)
文檔簡(jiǎn)介
1、什么是跨媒體?從應(yīng)用平臺(tái)方面理解電視機(jī)電腦手機(jī)報(bào)紙Ipad以文字搜文字以圖片搜圖片以文字搜圖片以文字搜視頻什么是跨媒體?從檢索研究方面理解什么是跨媒體? 2010年1月Nature發(fā)表的“2020 Vision”論文指出:文本、圖像、語音、視頻及其交互屬性將緊密混合(mix)在一起,即“跨媒體”。2011年2月Science開燈“Dealing with Data”專輯:數(shù)據(jù)的組織和使用體現(xiàn)跨媒體計(jì)算。趨勢(shì):從“多媒體”研究向“跨媒體”發(fā)展!什么是跨媒體?跨媒體特性即多媒體數(shù)據(jù)之間以及用戶互動(dòng)與多媒體數(shù)據(jù)之間存在著內(nèi)容跨越與語義關(guān)聯(lián)。吳飛, 莊越挺. 互聯(lián)網(wǎng)跨媒體分析與檢索:理論與算法. 計(jì)算
2、機(jī)輔助設(shè)計(jì)與圖形學(xué)學(xué)報(bào),Vol.22, No.1, pp.1-9, 2010.跨媒體的主要研究范疇跨媒體檢索:用戶向計(jì)算機(jī)提交一種類型的多媒體對(duì)象作為查詢例子,系統(tǒng)可以自動(dòng)找到其它不同類型及語義上相似的多媒體對(duì)象??缑襟w推理:跨媒體推理是指從一種類型的多媒體數(shù)據(jù),經(jīng)過問題求解轉(zhuǎn)向另外一種類型的多媒體數(shù)據(jù)。(OCR等)跨媒體存儲(chǔ):現(xiàn)有處理海量數(shù)據(jù)的檢索技術(shù)主要是針對(duì)文本信息,如google和百度等搜索引擎。跨媒體存儲(chǔ)研究高效壓縮、索引和分片等方法,以及對(duì)用戶行為的個(gè)性化索引等技術(shù)。驚濤駭浪?AudioVideoWebpageCorrelated multi-modal DataShared sp
3、aceHow to bridge both semantic-gap and heterogeneity gap?Japan Earthquake跨媒體分析的挑戰(zhàn)From FeiWu跨媒體的內(nèi)容鴻溝視覺特征空間聽覺特征空間高層語義空間爆炸、海洋、天空、鳥。語義鴻溝內(nèi)容鴻溝基于線性變換的子空間映射算法視覺特征空間聽覺特征空間投影子空間Heterogeneous Metric Learning with Joint Graph Regularizationfor Cross-Media RetrievalXiaohua Zhai, Yuxin Peng and Jianguo XiaoInstit
4、ute of Computer Science & technology, Peking UniversityAAAI 2013Existing metric learning methods have previously been designed primarily for single-media data and cannot be directly applied to cross-media data.Make full use of the structure information of the whole heterogeneous spaces.MotivationHet
5、erogeneous Metric Learning Given two sets of heterogeneous pairwise constraintsS is the set of similarity constraints and D is the set of dissimilarity constraints . Each pairwise constraints (xi,yj) indicates if two heterogeneous media objects xi and yj are relevant or irrelevant inferred from the
6、category label.Joint Graph Regularized Heterogeneous MetricThey propose to learn multiple linear transformation matrices U and V , they can map the heterogeneous media data to a common output spaces.The distance measure is defined as:Joint Graph Regularized Heterogeneous MetricObjective functionThe
7、formulation of the general regularization framework for heterogeneous distance metric learning is defined as:f (U, V) is the loss function defined on the sets of similarity and dissimilarity constraints S and D g(U, V) and r(U, V) are regularizer defined on the target parameter matrices U, V. , are
8、the balancing parameters.Joint Graph Regularized Heterogeneous MetricLoss functionThe minimization of the loss function will result in minimizing (maximizing) the distances between the media objects with the similarity (dissimilarity) constraintsNormalize the elements of Z column by column to make s
9、ure that the sum of each column is zero - to balance the influence of the similarity constraints and dissimilarity constraints.Joint Graph Regularized Heterogeneous MetricScale regularization r(U,V) is used to control the scale of the parameters matrices and reduce overfitting.Joint Graph Regularize
10、d Heterogeneous MetricJoint graph regularizationDefining a joint undirected graph, G = (V, W) on the dataset. Each element wij of the similarity matrix W = wij(m+n)(m+n) means the similarity between the i-th media object and j-th media object. Using label information to construct the symmetric simil
11、arity matrix: whereJoint Graph Regularized Heterogeneous MetricJoint graph regularizationSetting wii = 0 for 1 i m+n to avoid self-reinforcement. And the normalized graph Laplacian L is defined as: Where I is an (m+n)(m+n) identity matrix and D is an (m+n)(m+n) diagonal matrix with . is symmetric an
12、d positive semidefinite, with eigenvalue in the interval 0,2. where O represents for all of media objects in the learned metric space. denotes the normalized graph Laplacian. Joint Graph Regularized Heterogeneous MetricJoint graph regularizationThe formulation of g(U,V) :Minimizing g(U, V) encourage
13、s the smoothness of a mapping over the joint data graph, which is constructed from the initial label informationJoint Graph Regularized Heterogeneous Metric Iterative optimizationObtain orthogonal transformation matrices U and V , they minimize the following object function:where X and Y represent f
14、or two sets of coupled media objects from different media with the same labels. U and V define two orthogonal transformation spaces where media objects in X and Y can be projected as close to each other as possible.Maximize tr(XTUVTY) will minimize function, its singular value decomposition:Joint Gr
15、aph Regularized Heterogeneous MetricFix V and update U Different Q(U,V) with respect to U and V setting it to zero, respectively: Obtain the analytical solution U and V as We alternate between updates to U and V for several iterations to find a locally optimal solution. Here the iteration continues
16、until the cross-validation performance decreases on the training set. In practice, the iteration only repeats several rounds.Joint Graph Regularized Heterogeneous MetricDatasetsWikipedia: 2866 image-text pairs with label from the 10 semantic categories. This dataset is randomly split into a training
17、 set of 2173 documents and a test set of 693 documents. XMedia dataset : 5000 texts, 5000 images, 1000 audio, 500 videos and 500 3D models. This dataset is randomly split into a training set of 9600 media objects and a test set of 2400 media objects.ExperimentsFeatures Images: using bag-of-word mode
18、l. Each image is represented as a histogram of 128-codeword SIFT codebook. texts: each text represented as a 10-topic latent Dirichlet Allocation(LDA) model.Audio: 29-dim MFCC features to represent each clip of audio.Videos: segmenting each clip of video into video shots. Then 128-dimension BoW hist
19、ogram features are extracted for each video keyframe. The final similarity for video is obtained by averaging all of the similarities of the video keyframes. 3D model: Each 3D model is firstly represented as the concatenated 4700-dimension vector of a set of Light-Field descriptors as described in .
20、Then the concatenated vector is reduced to 128-dimension vector based on Principal Component Analysis (PCA)ExperimentsBaseline methods and Evaluation metricsCCA (Canonical correlation analysis): Through CCA we could learn the subspace that maximizes the correlation between two sets of heterogeneous
21、data.CFA(cross-modal factor analysis): it adopts a criterion of minimizing the Frobenius norm between pairwise data in the transformed domainCCA+SMN is current state-of-the-art , since it consider not only correlation analysis but also semantic abstraction for dierent modalities.ExperimentsMAP score
22、sExperimentsPrecision-Recall curvesExperiments多媒體數(shù)據(jù)的統(tǒng)一表達(dá)多媒體數(shù)據(jù)的表達(dá)是指采用哪個(gè)一定的數(shù)據(jù)結(jié)構(gòu)來表示多媒體樣本。例如,采用四元組表示web頁面中的一幅圖像,或者提取圖像的底層視覺特征,構(gòu)成多維向量來表示數(shù)據(jù)庫中的圖像。跨媒體檢索屬于基于內(nèi)容的多媒體檢索范疇,只不過在檢索對(duì)象上從單一類型的多媒體數(shù)據(jù)擴(kuò)充到多種不同類型的多媒體數(shù)據(jù),支持?jǐn)?shù)據(jù)間的靈活跨越??缑襟w檢索的性能很大程度上依賴于相似度匹配算法,而相似度匹配正式以不同類型的多媒體數(shù)據(jù)所采用的表達(dá)方式為依據(jù)的。因此數(shù)據(jù)表達(dá)模型的設(shè)計(jì)師非?;A(chǔ)和重要的。多媒體數(shù)據(jù)的統(tǒng)一表達(dá)設(shè)有尚未標(biāo)注的
23、圖像和音頻數(shù)據(jù)集合 ,作為訓(xùn)練數(shù)據(jù)集合,已知覆蓋了Z個(gè)語義類別,映射算法描述如下:步驟1 聚類1)對(duì)于每一個(gè)語義類別Zi,分別提取其中包括的圖像和音頻數(shù)據(jù)的底層內(nèi)容特征,建立相應(yīng)的特征矩陣SI,SA;2)對(duì)于每一個(gè)語義類別Z,隨機(jī)選取m個(gè)圖像例子Ii進(jìn)行語義標(biāo)注;3)計(jì)算Ii在底層特征空間上的聚類質(zhì)心ICri;4)與ICri為起始條件,對(duì)數(shù)據(jù)庫中所有的圖像數(shù)據(jù)進(jìn)行kmeans聚類;5)聚類結(jié)果中屬于相同類別的圖像被賦予與Ii相同的語義標(biāo)記;6)對(duì)音頻數(shù)據(jù)集重復(fù)1-4。多媒體數(shù)據(jù)的統(tǒng)一表達(dá)相關(guān)性保持映射1)分析圖像和音頻之間在底層內(nèi)容特征上的典型相關(guān)性,即計(jì)算SI和SA對(duì)應(yīng)的子空間基向量Wx和W
24、y; 2)求取視覺和聽覺特征響亮映射到子空間中的向量表示:Web環(huán)境中的跨媒體相關(guān)性推理在具體的應(yīng)用環(huán)境中,如web,往往包含了一些具體的數(shù)據(jù)特征,這些特征比多媒體數(shù)據(jù)本身的內(nèi)容特征蘊(yùn)含更直接的語義信息,可以用來輔助內(nèi)容特征進(jìn)行跨媒體檢索,提高檢索效率。例如,web連接就可以作為一種輔助特征。跨媒體關(guān)聯(lián)圖圖模型是一種常用的數(shù)據(jù)關(guān)系表達(dá)方式,可以用途模型表達(dá)web環(huán)境中的圖像,以及圖像相關(guān)的各種特征。這種表達(dá)方式不但可以清楚地描述數(shù)據(jù)之間的各種聯(lián)系,而且有助于發(fā)現(xiàn)數(shù)據(jù)之間的互補(bǔ)信息。對(duì)于多媒體數(shù)據(jù)而言,多種類型的多媒體數(shù)據(jù)之間存在著復(fù)雜的數(shù)據(jù)關(guān)系,主要可以劃分為模態(tài)內(nèi)部(intra-media
25、correlation)和模態(tài)之間(cross-media correlation)兩種數(shù)據(jù)關(guān)系。鏈接關(guān)系分析分別用V,I,A表示視頻、圖像和音頻數(shù)據(jù)集,m,n,k分別是數(shù)據(jù)集V,I,A中的樣本個(gè)數(shù),用xVi,xIi,xAi分別表示數(shù)據(jù)庫中第i個(gè)視頻、第i個(gè)圖像,以及第i個(gè)音頻數(shù)據(jù)的特征向量。根據(jù)如下兩個(gè)啟發(fā)式規(guī)則,可以利用web環(huán)境中多媒體數(shù)據(jù)所在網(wǎng)頁之間的鏈接關(guān)系,度量不同類型多媒體數(shù)據(jù)之間的相關(guān)性(cross-media distance)大小:規(guī)則1:如果兩個(gè)媒體對(duì)象a和b同屬于一個(gè)web頁面,則a和b在語義具有相似性;規(guī)則2:如果web頁面A指向另一頁面B和C,則B中包含的多媒體對(duì)象
26、和C中包含的多媒體對(duì)象在語義上具有相似性。鏈接關(guān)系分析根據(jù)上述啟發(fā)規(guī)則,建立視頻-圖像、圖像-音頻和音頻-視頻的跨媒體關(guān)聯(lián)矩陣LVI,LIA,LAV,以LIA為例,其矩陣元素rij表示多媒體數(shù)據(jù) 之間的相關(guān)值,rij計(jì)算方法如下:輸入:從web頁面獲取的圖像和音頻數(shù)據(jù)輸出:跨媒體相關(guān)矩陣LIA1.2. 3.4.5. Construct a symmetric matrix LIA, whose cell lij is the normalized values of rij.基于圖模型的全局相關(guān)性推理圖像音頻IaIbIcIdAaAbAc近年來的研究熱點(diǎn)Cross-media RetrievalCross-media RankingCross-media HashingCross-collection Topic ModelingFrom FeiWuMission: learn one appropriate metric for ranking multi-modal data to preserve the orders of relevance. For
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫網(wǎng)僅提供信息存儲(chǔ)空間,僅對(duì)用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
- 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。
最新文檔
- 2026年計(jì)算機(jī)編程工程師技能等級(jí)認(rèn)證筆試模擬題
- 2026年江蘇護(hù)理職業(yè)學(xué)院?jiǎn)握芯C合素質(zhì)考試題庫附答案
- 2026年桂林山水職業(yè)學(xué)院?jiǎn)握新殬I(yè)傾向性考試題庫必考題
- 2026年工程經(jīng)濟(jì)學(xué)原理與項(xiàng)目評(píng)估方法實(shí)踐題庫
- 2026年法律實(shí)務(wù)與案例解析考試題庫
- 2026年制造業(yè)客戶關(guān)系維護(hù)與長(zhǎng)期合作策略面試題
- 2026年專業(yè)稅務(wù)人員業(yè)務(wù)水平提升試題集
- 2026年環(huán)境治理工程及技術(shù)方法實(shí)踐問題集
- 2026年電氣工程實(shí)務(wù)電力系統(tǒng)故障排查題集
- 2026年機(jī)械工程師面試專業(yè)知識(shí)測(cè)試題庫
- 干部因私出國(guó)(境)管理有關(guān)要求
- 民爆物品倉庫安全操作規(guī)程
- 老年癡呆科普課件整理
- 2022年鈷資源產(chǎn)業(yè)鏈全景圖鑒
- von frey絲K值表完整版
- 勾股定理復(fù)習(xí)導(dǎo)學(xué)案
- GB/T 22900-2022科學(xué)技術(shù)研究項(xiàng)目評(píng)價(jià)通則
- GB/T 6418-2008銅基釬料
- GB/T 16621-1996母樹林營(yíng)建技術(shù)
- GB/T 14518-1993膠粘劑的pH值測(cè)定
- GB/T 14072-1993林木種質(zhì)資源保存原則與方法
評(píng)論
0/150
提交評(píng)論