版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進行舉報或認領
文檔簡介
教師備課紙?zhí)K州工業(yè)職業(yè)技術(shù)學院教師備課紙SuzhouInstituteOfIndustrialTechnology第
PAGE
19頁教師備課首頁蘇州工業(yè)職業(yè)技術(shù)學院教師備課首頁SuzhouInstituteOfIndustrialTechnology第1頁課題項目八社交Selenium項目實戰(zhàn)課型理實一體授課班級大數(shù)據(jù)22C1、大數(shù)據(jù)22C2授課時數(shù)8教學目標(1)掌握Selenium通過CSS選擇器選取元素。(2)掌握Selenium執(zhí)行JS滑動瀏覽器滾動條。(3)掌握re.match返回匹配group。(4)掌握解析內(nèi)容存儲到CSV文件。(5)掌握從CSV文件讀取到DataFrame。(6)掌握Jieba切分單詞。(7)掌握清除停用詞。(8)掌握排序鍵值對列表。(9)了解ECharts柱狀圖。(10)了解ECharts詞云圖。教學重點(1)掌握Selenium通過CSS選擇器選取元素。(2)掌握re.match返回匹配group。教學難點掌握re.match返回匹配group學情分析學生零基礎,在教學中注重引導學生學會自主學習,培養(yǎng)學生學會查找文獻資料教學效果教后記8.3.1任務1:Selenium+CSS選擇器采集數(shù)據(jù)下面以爬取某社交網(wǎng)站的arXiv每日學術(shù)速遞為例,介紹Selenium結(jié)合CSS選擇器采集”計算機視覺與模式識別”領域論文的詳細步驟。步驟1:轉(zhuǎn)到Tomcat安裝目錄下的bin目錄,運行startup.bat后,打開Chrome,訪問:50001/social/index.html,顯示網(wǎng)站首頁,代表Tomcat運行正常。步驟2:打開PyCharm,選擇CreateNewProject,打開NewProject窗口,創(chuàng)建項目social,選擇“Existinginterpreter”選項,Interpreter選擇Anaconda安裝目錄下的python.exe。單擊Create按鈕,進入social項目。步驟3:在項目根目錄下右鍵選擇New->PythonFile,創(chuàng)建social_spider.py。步驟4:完善social_spider.py,打開網(wǎng)站首頁后關(guān)閉,測試ChromeDriver。#encoding=utf-8importrandomimporttimefromseleniumimportwebdriverstart_url=":50001/social/index.html"if__name__=='__main__':#初始化ChromeDriverbrowser=webdriver.Chrome()#打開首頁browser.get(start_url)#最大化窗口browser.maximize_window()#顯式等待2~4秒鐘time.sleep(random.randint(2,5)+random.random())#退出瀏覽器,并釋放資源browser.quit()運行social_spider.py,看到Chrome瀏覽器打開網(wǎng)站首頁,然后關(guān)閉瀏覽器。步驟5:打開網(wǎng)站首頁,鼠標停留在第一個學術(shù)速遞后右鍵,選擇“檢查”,在右邊“Elements”窗口查找學術(shù)速遞div。對照左邊窗口的顯示,找到學術(shù)速遞對應的頁面元素。如下圖所示。從上圖可以看出,學術(shù)速遞的CSS模式為“div.Feed”。步驟6:social_spider.py定義do_once方法,主方法中調(diào)用do_once方法。importredefdo_once():#定位每日速遞列表topics=browser.find_elements_by_css_selector("div.Feed")fortopicintopics:#每日速遞content_title=topic.find_element_by_css_selector('h2.ContentItem-titlea').textcontent_title_groups=re.match("(.+)\[([\d|\.]+)\]",content_title)print(content_title_groups[1],content_title_groups[2])if__name__=='__main__':……#顯式等待2~4秒鐘time.sleep(random.randint(2,5)+random.random())#爬取首頁do_once()#退出瀏覽器,并釋放資源browser.quit()運行social_spider.py,輸出結(jié)果如下:統(tǒng)計學學術(shù)速遞2022.7.18計算機視覺與模式識別學術(shù)速遞2022.7.15自然語言處理學術(shù)速遞2022.7.15人工智能學術(shù)速遞2022.7.15機器學習學術(shù)速遞2022.7.15語音|音頻處理學術(shù)速遞2022.7.15金融|經(jīng)濟學術(shù)速遞2022.7.15機器人相關(guān)學術(shù)速遞2022.7.15統(tǒng)計學學術(shù)速遞2022.7.15計算機視覺與模式識別學術(shù)速遞2022.7.14自然語言處理學術(shù)速遞2022.7.14人工智能學術(shù)速遞2022.7.14機器學習學術(shù)速遞2022.7.14語音|音頻處理學術(shù)速遞2022.7.14金融|經(jīng)濟學術(shù)速遞2022.7.14機器人相關(guān)學術(shù)速遞2022.7.14統(tǒng)計學學術(shù)速遞2022.7.14計算機視覺與模式識別學術(shù)速遞2022.7.13自然語言處理學術(shù)速遞2022.7.13人工智能學術(shù)速遞2022.7.13步驟7:回到網(wǎng)站首頁,鼠標停留在第一個學術(shù)速遞后右鍵,選擇“檢查”,在右邊“Elements”窗口查找“閱讀全文”輸入框。對照左邊窗口的顯示,找到“閱讀全文”輸入框?qū)捻撁嬖?。如下圖所示。從上圖可以看出,“閱讀全文”輸入框的CSS模式為“input.ContentItem-more”。步驟8:完善social_spider.py的do_once函數(shù),展開“閱讀全文”div。另外,忽略處理“計算機視覺和模式識別”每日速遞。defdo_once():#定位每日速遞列表topics=browser.find_elements_by_css_selector("div.Feed")fortopicintopics:……ifcontent_title_groups:field_name=content_title_groups[1]#只處理”計算機與模式識別”方向論文iffield_name.find("計算機視覺與模式識別")>-1:field_name="計算機視覺與模式識別"#展開閱讀全文divreadall_btn=topic.find_element_by_css_selector('input.ContentItem-more')readall_btn.click()#隱式等待2秒鐘browser.implicitly_wait(2)步驟9:回到網(wǎng)站首頁,鼠標停留在第2個學術(shù)速遞后右鍵,選擇“檢查”,在右邊“Elements”窗口查找速遞詳情div。對照左邊窗口的顯示,找到速遞詳情對應的頁面元素。速遞詳情的CSS模式為“div.RichContent-inner>span”。步驟10:完善social_spider.py的do_once函數(shù),分離出每一行。defdo_once():#定位每日速遞列表topics=browser.find_elements_by_css_selector("div.Feed")fortopicintopics:……ifcontent_title_groups:field_name=content_title_groups[1]#只處理”計算機與模式識別”方向論文iffield_name.find("計算機視覺與模式識別")>-1:……#隱式等待2秒鐘browser.implicitly_wait(2)#解析當日速遞全文,行間分隔符位"\n"rich_content=topic.find_element_by_css_selector("div.RichContent-inner>span").textlines=rich_content.split("\n")步驟11:完善social_spider.py的do_once函數(shù),析取速遞詳情。defdo_once():#定位每日速遞列表topics=browser.find_elements_by_css_selector("div.Feed")fortopicintopics:……ifcontent_title_groups:field_name=content_title_groups[1]#只處理”計算機與模式識別”方向論文iffield_name.find("計算機視覺與模式識別")>-1:……lines=rich_content.split("\n")#解析出每一行包含的論文信息article_sec_end=Falseforlineinlines:sub_field_head_groups=re.match("(.+)\((\d+)篇\)",line)ifsub_field_head_groups:sub_field_name=sub_field_head_groups[1]else:article_title_groups=re.match("【(\d+)】(.+)",line)ifarticle_title_groups:article_title=article_title_groups[2].strip()article_ctitle_groups=re.match("標題:(.+)",line)ifarticle_ctitle_groups:article_ctitle=article_ctitle_groups[1].strip()article_href_groups=re.match("鏈接:(.+)",line)ifarticle_href_groups:article_href=article_href_groups[1].strip()article_authors_groups=re.match("作者:(.+)",line)ifarticle_authors_groups:article_authors=article_authors_groups[1].strip()article_org_groups=re.match("機構(gòu):(.+)",line)ifarticle_org_groups:article_org=article_org_groups[1].strip()article_sec_end=Trueprint(article_title,article_ctitle,article_href,article_authors,article_org,sub_field_name)運行social_spider.py,輸出結(jié)果如下:……統(tǒng)計學學術(shù)速遞2022.7.15計算機視覺與模式識別學術(shù)速遞2022.7.14Symmetry-AwareTransformer-basedMirrorDetection基于對稱性感知的Transformer鏡面檢測/abs/2207.06332TianyuHuang,BowenDong,JiayingLin,XiaohuiLiu,RynsonW.H.Lau,WangmengZuoHarbinInstituteofTechnology,CityUniversityofHongKongTransformerEntry-FlippedTransformerforInferenceandPredictionofParticipantBehavior用于參與者行為推理和預測的入口翻轉(zhuǎn)轉(zhuǎn)換器/abs/2207.06235BoHu,Tat-JenChamNanyangTechnologicalUniversity,SingaporeTransformer……LeftVentricleContouringofApicalThree-ChamberViewson2DEchocardiography心尖三腔心切面二維超聲心動圖左室壁輪廓/abs/2207.06330AlbertoGomez,MihaelaPorumb,AngelaMumith,ThierryJudge,ShanGao,Woo-JinChoKim,JorgeOliveira,AgisChartsiasUltromicsLtd,Oxford,UK,King’sCollegeLondon,UK,SherbrookeUniversity,Canada其他Robustandefficientcomputationofretinalfractaldimensionthroughdeepapproximation一種穩(wěn)健高效的深度逼近視網(wǎng)膜分維計算方法/abs/2207.05757JustinEngelmann,AnaVillaplana-Velasco,AmosStorkey,MiguelO.BernabeuCDTBiomedicalAI,SchoolofInformatics,UniversityofEdinburgh,CentreforMedicalInformatics,UniversityofEdinburgh其他自然語言處理學術(shù)速遞2022.7.14……計算機視覺與模式識別學術(shù)速遞2022.7.13自然語言處理學術(shù)速遞2022.7.13人工智能學術(shù)速遞2022.7.13步驟12:完善social_spider.py的do_once函數(shù),保存計算機視覺與模式識別論文信息到CSV文件。importcsvdefdo_once():#定位每日速遞列表topics=browser.find_elements_by_css_selector("div.Feed")fortopicintopics:……ifcontent_title_groups:field_name=content_title_groups[1]#只處理”計算機與模式識別”方向論文iffield_name.find("計算機視覺與模式識別")>-1:……lines=rich_content.split("\n")#解析出每一行包含的論文信息article_sec_end=Falseforlineinlines:sub_field_head_groups=re.match("(.+)\((\d+)篇\)",line)ifsub_field_head_groups:sub_field_name=sub_field_head_groups[1]else:……ifarticle_org_groups:……print(article_title,article_ctitle,article_href,article_authors,article_org,sub_field_name)ifarticle_sec_end:withopen('arxiv_articles.csv',mode='a',encoding='utf-8',newline='')asf:csv_write=csv.writer(f)csv_write.writerow([article_title,article_ctitle,article_href,article_authors,article_org,field_name,sub_field_name])article_sec_end=False打開arxiv_articles.csv,看到有66行數(shù)據(jù)。Symmetry-AwareTransformer-basedMirrorDetection,基于對稱性感知的Transformer鏡面檢測,/abs/2207.06332,"TianyuHuang,BowenDong,JiayingLin,XiaohuiLiu,RynsonW.H.Lau,WangmengZuo","HarbinInstituteofTechnology,CityUniversityofHongKong",計算機視覺與模式識別,TransformerEntry-FlippedTransformerforInferenceandPredictionofParticipantBehavior,用于參與者行為推理和預測的入口翻轉(zhuǎn)轉(zhuǎn)換器,/abs/2207.06235,"BoHu,Tat-JenCham","NanyangTechnologicalUniversity,Singapore",計算機視覺與模式識別,TransformerTrans4Map:RevisitingHolisticTop-downMappingfromEgocentricImagestoAllocentricSemanticswithVisionTransformers,Trans4Map:用VisionTransformers重溫從自我中心圖像到局部中心語義的整體自上而下映射,/abs/2207.06205,"ChangChen,JiamingZhang,KailunYang,KunyuPeng,RainerStiefelhagen","CV:HCILab,KarlsruheInstituteofTechnology",計算機視覺與模式識別,TransformerRTN:ReinforcedTransformerNetworkforCoronaryCTAngiographyVessel-levelImageQualityAssessment,RTN:用于冠狀動脈CT血管級圖像質(zhì)量評估的增強型Transformer網(wǎng)絡,/abs/2207.06177,"YitingLu,JunFu,XinLi,WeiZhou,SenLiu,XinxinZhang,CongfuJia,YingLiu,ZhiboChen","UniversityofScienceandTechnologyofChina,Hefei,Anhui,China,TheFirstAffiliatedHospitalofDalianMedicalUniversity,Dalian,Liaoning,China",計算機視覺與模式識別,Transformer……MultiStream:ASimpleandFastMultipleCamerasVisualMonitorandDirectlyStreaming,MULTREAM:一種簡單快速的多攝像頭視頻監(jiān)控和直接流媒體,/abs/2207.06078,JinweiLin,"ShenzhenResearchInstituteofBigData,Shenzhen,China,-,-,-",計算機視覺與模式識別,其他AnewdatabaseofHoumaAllianceBookancienthandwrittencharactersanditsbaselinealgorithm,一種新的侯馬聯(lián)書古手寫體字庫及其基線算法,/abs/2207.05993,"XiaoyuYuan,ZhiboZhang,YaboSun,ZekaiXue,XiuyanShao,XiaohuaHuang",".SchoolofComputerEngineering,NanjingInstituteofTechnology,.SoutheastUniversity,Nanjing,.JiangsuProvinceEngineeringResearchCenter,ofIntelliSenseTechnologyandSystem,Nanjing,Jiangsu,Correspondingauthor",計算機視覺與模式識別,其他LeftVentricleContouringofApicalThree-ChamberViewson2DEchocardiography,心尖三腔心切面二維超聲心動圖左室壁輪廓,/abs/2207.06330,"AlbertoGomez,MihaelaPorumb,AngelaMumith,ThierryJudge,ShanGao,Woo-JinChoKim,JorgeOliveira,AgisChartsias","UltromicsLtd,Oxford,UK,King’sCollegeLondon,UK,SherbrookeUniversity,Canada",計算機視覺與模式識別,其他Robustandefficientcomputationofretinalfractaldimensionthroughdeepapproximation,一種穩(wěn)健高效的深度逼近視網(wǎng)膜分維計算方法,/abs/2207.05757,"JustinEngelmann,AnaVillaplana-Velasco,AmosStorkey,MiguelO.Bernabeu","CDTBiomedicalAI,SchoolofInformatics,UniversityofEdinburgh,CentreforMedicalInformatics,UniversityofEdinburgh",計算機視覺與模式識別,其他上面顯示的數(shù)據(jù)并不是所有論文的速遞內(nèi)容。為了收集所有arXiv論文信息,需要向下移動瀏覽器的滾動欄,把更多的數(shù)據(jù)載入進頁面。步驟13:完善social_spider.py的主函數(shù),控制瀏覽器的滾動欄向下移動,頁面載入所有數(shù)據(jù)后再讀取每篇arXiv論文信息。if__name__=='__main__':#初始化ChromeDriverbrowser=webdriver.Chrome()#打開首頁browser.get(start_url)#最大化窗口browser.maximize_window()foriinrange(9):#顯式等待2~4秒鐘time.sleep(random.randint(2,5)+random.random())#移動滾動條到底部js='window.scrollTo(0,100000)'browser.execute_script(js)#爬取首頁do_once()#退出瀏覽器,并釋放資源browser.quit()打開arxiv_articles.csv,看到前66行和第67~132行重復。Symmetry-AwareTransformer-basedMirrorDetection,基于對稱性感知的Transformer鏡面檢測,/abs/2207.06332,"TianyuHuang,BowenDong,JiayingLin,XiaohuiLiu,RynsonW.H.Lau,WangmengZuo","HarbinInstituteofTechnology,CityUniversityofHongKong",計算機視覺與模式識別,TransformerEntry-FlippedTransformerforInferenceandPredictionofParticipantBehavior,用于參與者行為推理和預測的入口翻轉(zhuǎn)轉(zhuǎn)換器,/abs/2207.06235,"BoHu,Tat-JenCham","NanyangTechnologicalUniversity,Singapore",計算機視覺與模式識別,Transformer……Symmetry-AwareTransformer-basedMirrorDetection,基于對稱性感知的Transformer鏡面檢測,/abs/2207.06332,"TianyuHuang,BowenDong,JiayingLin,XiaohuiLiu,RynsonW.H.Lau,WangmengZuo","HarbinInstituteofTechnology,CityUniversityofHongKong",計算機視覺與模式識別,TransformerEntry-FlippedTransformerforInferenceandPredictionofParticipantBehavior,用于參與者行為推理和預測的入口翻轉(zhuǎn)轉(zhuǎn)換器,/abs/2207.06235,"BoHu,Tat-JenCham","NanyangTechnologicalUniversity,Singapore",計算機視覺與模式識別,Transformer……CRFormer:ACross-RegionTransformerforShadowRemoval,CRFormer:一種用于陰影去除的跨區(qū)域Transformer,/abs/2207.01600,"JinWan,HuiYin,ZhenyaoWu,XinyiWu,ZhihaoLiu,SongWang","BeijingJiaotongUniversity,UniversityofSouthCarolina,ChinaMobileResearchInstitute",計算機視覺與模式識別,TransformerDynamicSpatialSparsificationforEfficientVisionTransformersandConvolutionalNeuralNetworks,高效視覺轉(zhuǎn)換器和卷積神經(jīng)網(wǎng)絡的動態(tài)空間稀疏化,/abs/2207.01580,"YongmingRao,ZuyanLiu,WenliangZhao,JieZhou,JiwenLu","andtheDepartmentofAutomation,TsinghuaUniversity",計算機視覺與模式識別,Transformer……MonkeypoxImageDatacollection,猴痘圖像數(shù)據(jù)采集,/abs/2206.01774,"MdManjurulAhsan,MuhammadRamizUddin,ShahanaAkterLuna","IndustrialandSystemsEngineering,UniversityofOklahoma,Norman,Oklahoma-,Dept.ofChemistryandBiochemistry,Medicine&Surgery,DhakaMedicalCollege&Hospital,Dhaka,Bangladesh-",計算機視覺與模式識別,其他AutomaticQuantificationofVolumesandBiventricularFunctioninCardiacResonance.ValidationofaNewArtificialIntelligenceApproach,心臟共振中容量和雙心功能的自動量化。一種新的人工智能方法的驗證,/abs/2206.01746,"ArielH.Curiale,MatíasE.Calandrelli,LuccaDellazoppa,MarianoTrevisan,JorgeLuisBocián,JuanPabloBonifacio,GermánMato","Propuestayevaluacióndeunmétododeinteligenciaartificial1DepartmentofMedicalPhysics-TheBarilocheAtomicCenter-CONICET,UniversidadNacionaldeCuyo,HarvardMedicalSchool",計算機視覺與模式識別,其他任務2:Pandas清洗數(shù)據(jù)arxiv_articles.csv包含1098行,其中前66行是重復數(shù)據(jù),需要刪除。步驟1:在項目根目錄New->PythonFile,創(chuàng)建data_clean.py,讀入arxiv_articles.csv到DataFrame類型。#encoding=utf-8importpandasaspd#顯示所有列pd.set_option('display.max_columns',None)#顯示寬度pd.set_option('display.width',200)#列名和數(shù)據(jù)對齊pd.set_option('display.unicode.east_asian_width',True)#讀入CSV文件到DataFramelabels=["論文名字(英文)","論文名字(中文)","論文地址","論文作者","作者單位","研究領域","研究方向"]df=pd.read_csv('arxiv_articles.csv',encoding="utf-8",names=labels)print(df.head())運行data_clean.py,PyCharm控制臺輸出如下結(jié)果:論文名字(英文)論文名字(中文)論文地址\0Symmetry-AwareTransformer-basedMirrorDetection基于對稱性感知的Transformer鏡面檢測/abs/2207.063321Entry-FlippedTransformerforInferenceandPr...用于參與者行為推理和預測的入口翻轉(zhuǎn)轉(zhuǎn)換器/abs/2207.062352Trans4Map:RevisitingHolisticTop-downMappin...Trans4Map:用VisionTransformers重溫從自我中心圖像到局部中心語義.../abs/2207.062053RTN:ReinforcedTransformerNetworkforCorona...RTN:用于冠狀動脈CT血管級圖像質(zhì)量評估的增強型Transformer網(wǎng)絡/abs/2207.061774DynaST:DynamicSparseTransformerforExempla...Dynast:用于樣本引導圖像生成的動態(tài)稀疏轉(zhuǎn)換器/abs/2207.06124論文作者作者單位研究領域研究方向0TianyuHuang,BowenDong,JiayingLin,XiaohuiLi...HarbinInstituteofTechnology,CityUniversit...計算機視覺與模式識別Transformer1BoHu,Tat-JenChamNanyangTechnologicalUniversity,Singapore計算機視覺與模式識別Transformer2ChangChen,JiamingZhang,KailunYang,KunyuPen...CV:HCILab,KarlsruheInstituteofTechnology計算機視覺與模式識別Transformer3YitingLu,JunFu,XinLi,WeiZhou,SenLiu,Xinxi...UniversityofScienceandTechnologyofChina,...計算機視覺與模式識別Transformer4SonghuaLiu,JingwenYe,SuchengRen,XinchaoWangNationalUniversityofSingapore計算機視覺與模式識別Transformer步驟2:完善data_clean.py,刪除重復記錄。#去重print("去重前:",df.shape)df.drop_duplicates(subset=["論文名字(英文)","論文作者"],inplace=True)print("去重后:",df.shape)運行data_clean.py,PyCharm控制臺輸出如下結(jié)果:去重前:(1098,7)去重后:(1032,7)步驟3:完善data_clean.py,保存清洗后的數(shù)據(jù)。#保存DataFrame到CSV文件,沒有列頭,也沒有索引列df.to_csv("arxiv_articles_clean.csv",index=False,header=None)打開arxiv_articles_clean.csv,有1032條商品記錄。如下所示:Symmetry-AwareTransformer-basedMirrorDetection,基于對稱性感知的Transformer鏡面檢測,/abs/2207.06332,"TianyuHuang,BowenDong,JiayingLin,XiaohuiLiu,RynsonW.H.Lau,WangmengZuo","HarbinInstituteofTechnology,CityUniversityofHongKong",計算機視覺與模式識別,TransformerEntry-FlippedTransformerforInferenceandPredictionofParticipantBehavior,用于參與者行為推理和預測的入口翻轉(zhuǎn)轉(zhuǎn)換器,/abs/2207.06235,"BoHu,Tat-JenCham","NanyangTechnologicalUniversity,Singapore",計算機視覺與模式識別,TransformerTrans4Map:RevisitingHolisticTop-downMappingfromEgocentricImagestoAllocentricSemanticswithVisionTransformers,Trans4Map:用VisionTransformers重溫從自我中心圖像到局部中心語義的整體自上而下映射,/abs/2207.06205,"ChangChen,JiamingZhang,KailunYang,KunyuPeng,RainerStiefelhagen","CV:HCILab,KarlsruheInstituteofTechnology",計算機視覺與模式識別,Transformer……Singlepixelimagingathighpixelresolutions,高像素分辨率的單像素成像,/abs/2206.02510,"Rafa?Stojek,AnnaPastuszczak,PiotrWróbel,Rafa?Kotyński","RAFA?KOTY′NSKI,,UniversityofWarsaw,Pasteura,-,Warsaw,Poland,VigoSystem,Poznańska,,-,O?arówMazowiecki,Poland",計算機視覺與模式識別,其他MonkeypoxImageDatacollection,猴痘圖像數(shù)據(jù)采集,/abs/2206.01774,"MdManjurulAhsan,MuhammadRamizUddin,ShahanaAkterLuna","IndustrialandSystemsEngineering,UniversityofOklahoma,Norman,Oklahoma-,Dept.ofChemistryandBiochemistry,Medicine&Surgery,DhakaMedicalCollege&Hospital,Dhaka,Bangladesh-",計算機視覺與模式識別,其他AutomaticQuantificationofVolumesandBiventricularFunctioninCardiacResonance.ValidationofaNewArtificialIntelligenceApproach,心臟共振中容量和雙心功能的自動量化。一種新的人工智能方法的驗證,/abs/2206.01746,"ArielH.Curiale,MatíasE.Calandrelli,LuccaDellazoppa,MarianoTrevisan,JorgeLuisBocián,JuanPabloBonifacio,GermánMato","Propuestayevaluacióndeunmétododeinteligenciaartificial1DepartmentofMedicalPhysics-TheBarilocheAtomicCenter-CONICET,UniversidadNacionaldeCuyo,HarvardMedicalSchool",計算機視覺與模式識別,其他任務3:Pandas分析數(shù)據(jù)經(jīng)過數(shù)據(jù)清洗后,采用Pandas按照城市分組統(tǒng)計商品種類。步驟1:在項目根目錄New–>PythonFile,創(chuàng)建data_analysis.py,讀入arxiv_articles_clean.csv到DataFrame對象。#encoding=utf-8importpandasaspd#顯示所有列pd.set_option('display.max_columns',None)#顯示寬度pd.set_option('display.width',200)#列名和數(shù)據(jù)對齊pd.set_option('display.unicode.east_asian_width',True)#讀入CSV文件到DataFramelabels=["論文名字(英文)","論文名字(中文)","論文地址","論文作者","作者單位","研究領域","研究方向"]df=pd.read_csv('arxiv_articles.csv',encoding="utf-8",names=labels)print(df.head())運行data_analysis.py,PyCharm控制臺輸出如下結(jié)果:論文名字(英文)論文名字(中文)論文地址\0Symmetry-AwareTransformer-basedMirrorDetection基于對稱性感知的Transformer鏡面檢測/abs/2207.063321Entry-FlippedTransformerforInferenceandPr...用于參與者行為推理和預測的入口翻轉(zhuǎn)轉(zhuǎn)換器/abs/2207.062352Trans4Map:RevisitingHolisticTop-downMappin...Trans4Map:用VisionTransformers重溫從自我中心圖像到局部中心語義.../abs/2207.062053RTN:ReinforcedTransformerNetworkforCorona...RTN:用于冠狀動脈CT血管級圖像質(zhì)量評估的增強型Transformer網(wǎng)絡/abs/2207.061774DynaST:DynamicSparseTransformerforExempla...Dynast:用于樣本引導圖像生成的動態(tài)稀疏轉(zhuǎn)換器/abs/2207.06124論文作者作者單位研究領域研究方向0TianyuHuang,BowenDong,JiayingLin,XiaohuiLi...HarbinInstituteofTechnology,CityUniversit...計算機視覺與模式識別Transformer1BoHu,Tat-JenChamNanyangTechnologicalUniversity,Singapore計算機視覺與模式識別Transformer2ChangChen,JiamingZhang,KailunYang,KunyuPen...CV:HCILab,KarlsruheInstituteofTechnology計算機視覺與模式識別Transformer3YitingLu,JunFu,XinLi,WeiZhou,SenLiu,Xinxi...UniversityofScienceandTechnologyofChina,...計算機視覺與模式識別Transformer4SonghuaLiu,JingwenYe,SuchengRen,XinchaoWangNationalUniversityofSingapore計算機視覺與模式識別Transformer步驟2:完善data_analysis.py,用Jieba把中文論文名稱分詞,統(tǒng)計每個單詞的使用頻率。#切分單詞counts={}titles=df["論文名字(中文)"].tolist()fortitleintitles:words_raw=jieba.lcut(title,cut_all=True)forwordinwords_raw:iflen(word)<=1:continueelse:counts[word]=counts.get(word,0)+1#計數(shù)print(counts)運行data_analysis.py,PyCharm控制臺輸出如下結(jié)果:{'基于':285,'對稱':7,'對稱性':3,'性感':6,'感知':39,'Transformer':37,'鏡面':2,'檢測':143,'用于':150,'參與':2,'參與者':2,……'高像素':1,'容量':1,'心功能':1}發(fā)現(xiàn)一些單詞是虛詞,比如‘基于’、‘用于’都是虛詞,統(tǒng)計這些虛詞沒有意義,在詞頻統(tǒng)計時需要跳過。步驟3:在項目根目錄New->File,創(chuàng)建stopwords.txt,定義停用詞表。基于用于可以進行面向一個一種不可一次及其不同一對兩步更好使用步驟4:完善data_analysis.py,待處理詞語列表跳過stopwords.txt定義的停用詞。#從待處理單詞中刪除停用詞words_clean=[]stopwords=[line.strip()forlineinopen("stopwords.txt",encoding='utf-8')]print("處理前詞語數(shù)量:",len(counts))fork,vincounts.items():ifk.strip()notinstopwords:words_clean.append((k,v))print("處理后詞語數(shù)量:",len(words_clean))運行data_analysis.py,PyCharm控制臺輸出如下結(jié)果:處理前詞語數(shù)量:2820處理后詞語數(shù)量:2805步驟5:完善data_analysis.py,按照詞語使用頻率倒序排列。為了便于復制處理結(jié)果到頁面,也轉(zhuǎn)換Python列表到JavaScript列表對象。#按照詞頻倒序排列words_clean.sort(key=lambdax:x[1],reverse=True)#生成JS格式字典列表,便于復制html_str="["fork,vinwords_clean[0:100]:html_str=html_str+"{"+"name:'{}',value:'{}'".format(k,v)+"},"#刪除最后一個元素后面的","html_str=html_str[0:-1]+"]"print(html_str)運行data_analysis.py,PyCharm控制臺輸出如下結(jié)果:[{name:'學習',value:'212'},{name:'圖像',value:'205'},{name:'檢測',value:'143'},…..,{name:'一致性',value:'17'},{name:'編碼',value:'17'},{name:'標記',value:'17'}]8.3.4任務4:ECharts可視化數(shù)據(jù)詞云圖可以直觀地顯示詞語的出現(xiàn)頻率,這里通過詞云圖可視化計算機視覺和模式識別論文標題中的熱門詞。步驟1:在項目根目錄New->Directory,創(chuàng)建app目錄。步驟2:在app目錄下New->Directory,創(chuàng)建目錄static,然后把echarts.min.js和echarts-wordcloud.min.js復制到app/static目錄。步驟3:在app目錄下New->Directory,創(chuàng)建templates目錄。步驟4:在templates目錄下NewHTMLFile,創(chuàng)建single_chart.html。<!DOCTYPEhtml><htmllang="en"><head><metacharset="UTF-8"><title>單圖</title></head><body></body></html>步驟5:在single_chart.html中引入依賴的JS文件。<!DOCTYPEhtml><htmllang="en"><head><metacharset="UTF-8"><title>單圖</title><scriptsrc="../static/echarts.min.js"></script><scriptsrc="../static/echarts-wordcloud.min.js"></script></head><body></body></html>步驟6:完善single_chart.html,定義div元素和編寫JS代碼。其中,data值來自上面任務data_analysis.py的運行結(jié)果。myChart1獲取chart1元素,把wordFreqData傳給底層的詞云庫,顯示詞云圖。<!DOCTYPEhtml><htmllang="en"><head><metacharset="UTF-8"><title>單圖</title><scriptsrc="../static/echarts.min.js"></script><scriptsrc="../static/echarts-wordcloud.min.js"></script></head><body><divid="chart1"style="float:left;width:600px;height:400px"></div> <script>varmychart1=echarts.init(document.getElementById("chart1"));option={title:{text:'計算機視覺與模式識別論文題目熱門詞',x:'center',textStyle:{color:'red',fontWeight:'bold',fontSize:'20'}},tooltip:{show:true},series:[{name:'計算機視覺與模式識別論文題目熱門詞',type:'wordCloud',sizeRange:[6,66],textStyle:{normal:{color:function(){return'rgb('+[Math.round(Math.random()*160),Math.round(Math.random()*160),Math.round(Math.random()*160)].join(',')+')';}},emphasis:{shadowBlur:10,
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預覽,若沒有圖紙預覽就沒有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負責。
- 6. 下載文件中如有侵權(quán)或不適當內(nèi)容,請與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。
最新文檔
- 黃岡市招聘國家公費師范生考試真題2025
- 2026年湘中幼兒師范高等??茖W校單招綜合素質(zhì)筆試備考試題帶答案解析
- 2026年南京工業(yè)職業(yè)技術(shù)大學單招綜合素質(zhì)筆試備考題庫帶答案解析
- 2026年山東服裝職業(yè)學院單招綜合素質(zhì)考試模擬試題帶答案解析
- 市場調(diào)研報告撰寫與實操指南
- 幼兒園教學主任年度工作計劃書
- 2026年經(jīng)濟周期中文化產(chǎn)業(yè)投資合同協(xié)議
- 建筑項目時間管理關(guān)鍵點總結(jié)
- 新課標人教版九年級英語單詞默寫冊
- 歷年中考易錯單選題解析集錦
- 竣工報告范文
- 高考語文復習:賞析小說環(huán)境描寫 課件
- 國開四川《行政案例分析》形成性考核1-2終結(jié)性考核答案
- BIM標準管線綜合BIM指南
- 《MH-T7003-2017民用運輸機場安全保衛(wèi)設施》
- 閱讀存折模板(一年級)
- 如何應對無信號燈控制的路口
- 眼科白內(nèi)障“一病一品”
- FSSC22000V6.0體系文件清單
- 支座的鑄造設計
- 集團下屬單位收益管理辦法
評論
0/150
提交評論