版權(quán)說(shuō)明:本文檔由用戶(hù)提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)
文檔簡(jiǎn)介
去掉停用詞
問(wèn)題描述現(xiàn)有某電視劇彈幕信息,請(qǐng)去掉彈幕信息里面的停用詞,然后以列表的形式輸出彈幕中詞頻最高的10個(gè)詞。contentslikeCounttv_name0二刷的朋友有嗎20111我希望一切能重來(lái)312這段眼神變化的太妙了913良心啊,一小時(shí)18414基本都好201............59995這個(gè)葉爸有點(diǎn)東西2271259996眼鏡掉在案發(fā)現(xiàn)場(chǎng)了901259997俺的眼睛掉在廠里了101259998他不戴假發(fā)你更不習(xí)慣171259999那是什么藥呀3312輸出結(jié)果詞語(yǔ)詞頻孩子2030爬山1913嚴(yán)良1511真的1407一個(gè)1305媽媽939演技902一起865普普846感覺(jué)782問(wèn)題分析問(wèn)題描述問(wèn)題解答怎樣將句子切割成為詞語(yǔ)?
怎樣把彈幕信息表和停用詞表聯(lián)合起來(lái)?怎樣統(tǒng)計(jì)詞頻?cut()merge()value_counts()操作提示利用jieba庫(kù)中的cut()函數(shù)對(duì)彈幕信息進(jìn)行分詞后轉(zhuǎn)換為數(shù)據(jù)框,將之與停用詞數(shù)據(jù)框進(jìn)行合并,篩選出不在停用詞表中的詞語(yǔ),統(tǒng)計(jì)這些詞出現(xiàn)的詞頻,這樣得到了題目要求的結(jié)果。程序代碼importpandasaspdimportjiebadata=pd.read_csv(r"D:\pydata\項(xiàng)目四\某電視劇彈幕信息.csv")stop_word=open(r"D:\pydata\項(xiàng)目四\停用詞.txt","r",encoding='utf-8')stop_word=stop_word.read().split()stop_word=pd.DataFrame(stop_word,columns=["stopword"])word=pd.DataFrame(jieba.cut("".join(data.iloc[:,1].astype(str))),columns=["word"])word=pd.merge(word,stop_word,left_on=["word"],right_on=["stopword"],how="left")word=word.query("stopword.isnull()andword.str.len()>1",engine='python')["word"]word=word.value_counts()print(word.head(10))程序代碼importpandasaspdimportjiebadata=pd.read_csv(r"D:\pydata\項(xiàng)目四\某電視劇彈幕信息.csv")stop_word=open(r"D:\pydata\項(xiàng)目四\停用詞.txt","r",encoding='utf-8')stop_word=stop_word.read().split()stop_word=pd.DataFrame(stop_word,columns=["stopword"])word=pd.DataFrame(jieba.cut("".join(data.iloc[:,1].astype(str))),columns=["word"])word=pd.merge(word,stop_word,left_on=["word"],right_on=["stopword"],how="left")word=word.query("stopword.isnull()andword.str.len()>1",engine='python')["word"]word=word.value_counts()print(word.head(10))程序代碼pandas提供了大量能使我們快速便捷地處理數(shù)據(jù)的函數(shù)和方法。程序代碼importpandasaspdimportjiebadata=pd.read_csv(r"D:\pydata\項(xiàng)目四\某電視劇彈幕信息.csv")stop_word=open(r"D:\pydata\項(xiàng)目四\停用詞.txt","r",encoding='utf-8')stop_word=stop_word.read().split()stop_word=pd.DataFrame(stop_word,columns=["stopword"])word=pd.DataFrame(jieba.cut("".join(data.iloc[:,1].astype(str))),columns=["word"])word=pd.merge(word,stop_word,left_on=["word"],right_on=["stopword"],how="left")word=word.query("stopword.isnull()andword.str.len()>1",engine='python')["word"]word=word.value_counts()print(word.head(10))程序代碼jieba是python的一個(gè)中文分詞庫(kù),具有高性能、高準(zhǔn)確率、可擴(kuò)展等特點(diǎn)程序代碼importpandasaspdimportjiebadata=pd.read_csv(r"D:\pydata\項(xiàng)目四\某電視劇彈幕信息.csv")stop_word=open(r"D:\pydata\項(xiàng)目四\停用詞.txt","r",encoding='utf-8')stop_word=stop_word.read().split()stop_word=pd.DataFrame(stop_word,columns=["stopword"])word=pd.DataFrame(jieba.cut("".join(data.iloc[:,1].astype(str))),columns=["word"])word=pd.merge(word,stop_word,left_on=["word"],right_on=["stopword"],how="left")word=word.query("stopword.isnull()andword.str.len()>1",engine='python')["word"]word=word.value_counts()print(word.head(10))程序代碼contentslikeCounttv_name0二刷的朋友有嗎20111我希望一切能重來(lái)312這段眼神變化的太妙了913良心啊,一小時(shí)18414基本都好201............59995這個(gè)葉爸有點(diǎn)東西2271259996眼鏡掉在案發(fā)現(xiàn)場(chǎng)了901259997俺的眼睛掉在廠里了101259998他不戴假發(fā)你更不習(xí)慣171259999那是什么藥呀3312data=pd.read_csv(r"D:\pydata\項(xiàng)目四\某電視劇彈幕信息.csv")程序代碼importpandasaspdimportjiebadata=pd.read_csv(r"D:\pydata\項(xiàng)目四\某電視劇彈幕信息.csv")stop_word=open(r"D:\pydata\項(xiàng)目四\停用詞.txt","r",encoding='utf-8')stop_word=stop_word.read().split()stop_word=pd.DataFrame(stop_word,columns=["stopword"])word=pd.DataFrame(jieba.cut("".join(data.iloc[:,1].astype(str))),columns=["word"])word=pd.merge(word,stop_word,left_on=["word"],right_on=["stopword"],how="left")word=word.query("stopword.isnull()andword.str.len()>1",engine='python')["word"]word=word.value_counts()print(word.head(10))程序代碼importpandasaspdimportjiebadata=pd.read_csv(r"D:\pydata\項(xiàng)目四\某電視劇彈幕信息.csv")stop_word=open(r"D:\pydata\項(xiàng)目四\停用詞.txt","r",encoding='utf-8')stop_word=stop_word.read().split()stop_word=pd.DataFrame(stop_word,columns=["stopword"])word=pd.DataFrame(jieba.cut("".join(data.iloc[:,1].astype(str))),columns=["word"])word=pd.merge(word,stop_word,left_on=["word"],right_on=["stopword"],how="left")word=word.query("stopword.isnull()andword.str.len()>1",engine='python')["word"]word=word.value_counts()print(word.head(10))程序代碼切割后的列表為:['$','0','1','2','3','4','5','6','7','8','9......'非獨(dú)','靠','順','順著','首先','!',',',':',';','?']程序代碼importpandasaspdimportjiebadata=pd.read_csv(r"D:\pydata\項(xiàng)目四\某電視劇彈幕信息.csv")stop_word=open(r"D:\pydata\項(xiàng)目四\停用詞.txt","r",encoding='utf-8')stop_word=stop_word.read().split()stop_word=pd.DataFrame(stop_word,columns=["stopword"])word=pd.DataFrame(jieba.cut("".join(data.iloc[:,1].astype(str))),columns=["word"])word=pd.merge(word,stop_word,left_on=["word"],right_on=["stopword"],how="left")word=word.query("stopword.isnull()andword.str.len()>1",engine='python')["word"]word=word.value_counts()print(word.head(10))程序代碼
stopword0$10213243.....741!742,743:744;745?生成的停用詞表stop_word=pd.DataFrame(stop_word,columns=["stopword"])程序代碼importpandasaspdimportjiebadata=pd.read_csv(r"D:\pydata\項(xiàng)目四\某電視劇彈幕信息.csv")stop_word=open(r"D:\pydata\項(xiàng)目四\停用詞.txt","r",encoding='utf-8')stop_word=stop_word.read().split()stop_word=pd.DataFrame(stop_word,columns=["stopword"])word=pd.DataFrame(jieba.cut("".join(data.iloc[:,1].astype(str))),columns=["word"])word=pd.merge(word,stop_word,left_on=["word"],right_on=["stopword"],how="left")word=word.query("stopword.isnull()andword.str.len()>1",engine='python')["word"]word=word.value_counts()print(word.head(10))程序代碼word0二刷1的2朋友3有4嗎......339467那339468是339469什么339470藥339471呀word=pd.DataFrame(jieba.cut("".join(data.iloc[:,1].astype(str))),columns=["word"])程序代碼word=pd.DataFrame(jieba.cut("".join(data.iloc[:,1].astype(str))),columns=["word"])contentslikeCounttv_name0二刷的朋友有嗎20111我希望一切能重來(lái)312這段眼神變化的太妙了913良心啊,一小時(shí)18414基本都好201............59995這個(gè)葉爸有點(diǎn)東西2271259996眼鏡掉在案發(fā)現(xiàn)場(chǎng)了901259997俺的眼睛掉在廠里了101259998他不戴假發(fā)你更不習(xí)慣171259999那是什么藥呀3312“二刷的朋友有嗎我希望一切能重來(lái)這段眼神變化的太妙了良心啊,一小時(shí)……好了警官你是下一個(gè)好一個(gè)不戴眼鏡的斯文敗類(lèi)兒子你啥時(shí)候?qū)W習(xí)啊居然還不說(shuō)實(shí)話?我不戴假發(fā)更厲害演完這部電影,伊能靜開(kāi)始怕了你看我還有機(jī)會(huì)嗎這個(gè)葉爸有點(diǎn)東西眼鏡掉在案發(fā)現(xiàn)場(chǎng)了俺的眼睛掉在廠里了他不戴假發(fā)你更不習(xí)慣那是什么藥呀”程序代碼word=pd.DataFrame(jieba.cut("".join(data.iloc[:,1].astype(str))),columns=["word"])“二刷的朋友有嗎我希望一切能重來(lái)這段眼神變化的太妙了良心啊,一小時(shí)……好了警官你是下一個(gè)好一個(gè)不戴眼鏡的斯文敗類(lèi)兒子你啥時(shí)候?qū)W習(xí)啊居然還不說(shuō)實(shí)話?我不戴假發(fā)更厲害演完這部電影,伊能靜開(kāi)始怕了你看我還有機(jī)會(huì)嗎這個(gè)葉爸有點(diǎn)東西眼鏡掉在案發(fā)現(xiàn)場(chǎng)了俺的眼睛掉在廠里了他不戴假發(fā)你更不習(xí)慣那是什么藥呀”[‘二刷’,‘的’,‘朋友’,‘有’,‘嗎’,‘我’,‘希望’,……‘他’,’不戴’,‘假發(fā)’,你,‘更不’,‘習(xí)慣’,‘那是’,‘什么’,‘藥’,‘呀’]程序代碼word=pd.DataFrame(jieba.cut("".join(data.iloc[:,1].astype(str))),columns=["word"])[‘二刷’,‘的’,‘朋友’,‘有’,‘嗎’,‘我’,‘希望’,……‘他’,’不戴’,‘假發(fā)’,你,‘更不’,‘習(xí)慣’,‘那是’,‘什么’,‘藥’,‘呀’]word0二刷1的2朋友3有4嗎......339467那339468是339469什么339470藥339471呀程序代碼importpandasaspdimportjiebadata=pd.read_csv(r"D:\pydata\項(xiàng)目四\某電視劇彈幕信息.csv")stop_word=open(r"D:\pydata\項(xiàng)目四\停用詞.txt","r",encoding='utf-8')stop_word=stop_word.read().split()stop_word=pd.DataFrame(stop_word,columns=["stopword"])word=pd.DataFrame(jieba.cut("".join(data.iloc[:,1].astype(str))),columns=["word"])word=pd.merge(word,stop_word,left_on=["word"],right_on=["stopword"],how="left")word=word.query("stopword.isnull()andword.str.len()>1",engine='python')["word"]word=word.value_counts()print(word.head(10))程序代碼word=pd.merge(word,stop_word,left_on=["word"],right_on=["stopword"],how="left")wordstopword0二刷NaN1的的2朋友NaN3有有4嗎嗎.........339467那那339468是是339469什么什么339470藥NaN339471呀呀程序代碼importpandasaspdimportjiebadata=pd.read_csv(r"D:\pydata\項(xiàng)目四\某電視劇彈幕信息.csv")stop_word=open(r"D:\pydata\項(xiàng)目四\停用詞.txt","r",encoding='utf-8')stop_word=stop_word.read().split()stop_word=pd.DataFrame(stop_word,columns=["stopword"])word=pd.DataFrame(jieba.cut("".join(data.iloc[:,1].astype(str))),columns=["word"])word=pd.merge(word,stop_word,left_on=["word"],right_on=["stopword"],how="left")word=word.query("stopword.isnull()andword.str.len()>1",engine='python')["word"]word=word.value_counts()print(word.head(10))程序代碼wordstopword0二刷NaN2朋友NaN6希望NaN9重來(lái)NaN10這段NaN.........339451案發(fā)現(xiàn)場(chǎng)NaN339455眼睛NaN339458廠里NaN339462戴假發(fā)NaN339466習(xí)慣NaNword=word.query("stopword.isnull()andword.str.len()>1",engine='python')["word"]程序代碼importpandasaspdimportjiebadata=pd.read_csv(r"D:\pydata\項(xiàng)目四\某電視劇彈幕信息.csv")stop_word=open(r"D:\pydata\項(xiàng)目四\停用詞.txt","r",encoding='utf-8')stop_word=stop_word.read().split()stop_word=pd.DataFrame(stop_word,columns=["stopword"])word=pd.DataFrame(jieba.cut("".join(data.iloc[:,1].astype(str))),columns=["word"])word=pd.merge(word,stop_word,left_on=["word"],right_on=["stopword"],how="left")word=word.query("stopword.isnull()andword.str.len()>1",engine='python')["word"]word=word.value_counts()print(word.head(10))程序代碼詞語(yǔ)數(shù)量孩子2023爬山1911嚴(yán)良1511真的1407一個(gè)1305
...
...卡爾1胡成1親熱1碰過(guò)1案發(fā)現(xiàn)場(chǎng)1word=word.value_counts()程序代碼importpandasaspdimportjiebadata=pd.read_csv(r"D:\pydata\項(xiàng)目四\某電視劇彈幕信息.csv")stop_word=open(r"D:\pydata\項(xiàng)目四\停用詞.txt","r",encoding='utf-8')stop_word=stop_word.read().split()stop_word=pd.DataFrame(stop_word,columns=["stopword"])word=pd.DataFrame(jieba.cut("".join(data.iloc[:,1].astype(str))),columns=["word"])word=pd.merge(word,stop_word,left_on=["word"],right_on=["stopword"],how="left")word=word.query("stopword.isnull()andword.str.len()>1",engine='python')["word"]word=word.value_counts()print(word.head(10))任務(wù)小結(jié)merge()函數(shù)通過(guò)列或索引將兩個(gè)數(shù)據(jù)框相關(guān)的數(shù)據(jù)行合并成一行,構(gòu)成一個(gè)新的數(shù)據(jù)框。為了提供更為靈活的操作來(lái)滿足實(shí)際工作的需要。一展身手現(xiàn)有某電視劇彈幕信息,請(qǐng)去掉彈幕信息里面的停用詞,然后以列表的形式輸出第一集的彈幕中詞頻最高的10個(gè)詞。結(jié)果為如下列表:['爬山','真實(shí)','一起','電影','豐田','不錯(cuò)','感覺(jué)','欺負(fù)','秦昊','真的']制作團(tuán)隊(duì)制作:劉學(xué)重慶市九龍坡職業(yè)教育中心選取男士最喜歡的電影主講人:劉學(xué)重慶市九龍坡職業(yè)教育中心問(wèn)題描述現(xiàn)有三張表,“users”(用戶(hù)信息)表,“ratings”(評(píng)分)表,“movies”(電影信息)表,三個(gè)表的字段如圖所示,請(qǐng)統(tǒng)計(jì)出男士最喜歡的10部電影的信息。UserID用戶(hù)idGender性別Age年齡Occupation職業(yè)Zip-code郵編MovieID電影idTitle電影名Genres類(lèi)型UserID用戶(hù)idMovieID電影idRating評(píng)分Timestamp時(shí)間戳users
ratingsmovies輸出結(jié)果MovieIDTitleGenresRating787GateofHeavenlyPeace,The(1995)Documentary5.0985SmallWonders(1996)Documentary5.03233SmashingTime(1967)Comedy5.03280Baby,The(1973)Horror5.03172Ulysses(Ulisse)(1954)Adventure5.0439DangerousGame(1993)Drama5.0130Angela(1995)Drama5.03656Lured(1947)Crime5.01830FollowtheBitch(1998)Comedy5.0989SchlafesBruder(BrotherofSleep)(1995)Drama5.0問(wèn)題分析問(wèn)題描述問(wèn)題解答最終輸出的信息從那幾個(gè)表中獲???
怎樣對(duì)表進(jìn)行合并?怎樣得出男性評(píng)分最高的電影?3張表都需要merge()先合并表再統(tǒng)計(jì)操作提示首先是合并評(píng)分表和用戶(hù)信息表,得出男性評(píng)分最高的電影的ID和評(píng)分,然后把得到的新表和電影信息表進(jìn)行合并,最后對(duì)評(píng)分進(jìn)行降序排序就得出了男性最喜歡的電影信息。程序代碼importpandasaspdmovies=pd.read_table(r"D:\pydata\項(xiàng)目四\movies.dat",sep='::',header=None,names=['MovieID','Title','Genres'],engine='python',encoding='iso-8859-15')ratings=pd.read_table(r"D:\pydata\項(xiàng)目四\ratings.dat",sep='::',header=None,names=['UserID','MovieID','Rating','Timestamp'],engine='python',encoding='iso-8859-15')users=pd.read_table(r"D:\pydata\項(xiàng)目四\users.dat",sep='::',header=None,names=['UserID','Gender','Age','Occupation','Zip-code'],engine='python',encoding='iso-8859-15')info=pd.merge(ratings,users,on="UserID",how="inner")①info=info[info["Gender"]=="M"]②info=info.groupby("MovieID")["Rating"].mean()③res=pd.merge(movies,info,on="MovieID")④res=res.sort_values(by="Rating",ascending=False)⑤res=res.round({"Rating":2})res.head(10)程序代碼importpandasaspdmovies=pd.read_table(r"D:\pydata\項(xiàng)目四\movies.dat",sep='::',header=None,names=['MovieID','Title','Genres'],engine='python',encoding='iso-8859-15')ratings=pd.read_table(r"D:\pydata\項(xiàng)目四\ratings.dat",sep='::',header=None,names=['UserID','MovieID','Rating','Timestamp'],engine='python',encoding='iso-8859-15')users=pd.read_table(r"D:\pydata\項(xiàng)目四\users.dat",sep='::',header=None,names=['UserID','Gender','Age','Occupation','Zip-code'],engine='python',encoding='iso-8859-15')info=pd.merge(ratings,users,on="UserID",how="inner")①info=info[info["Gender"]=="M"]②info=info.groupby("MovieID")["Rating"].mean()③res=pd.merge(movies,info,on="MovieID")④res=res.sort_values(by="Rating",ascending=False)⑤res=res.round({"Rating":2})res.head(10)程序代碼pandas提供了大量能使我們快速便捷地處理數(shù)據(jù)的函數(shù)和方法。程序代碼importpandasaspdmovies=pd.read_table(r"D:\pydata\項(xiàng)目四\movies.dat",sep='::',header=None,names=['MovieID','Title','Genres'],engine='python',encoding='iso-8859-15')ratings=pd.read_table(r"D:\pydata\項(xiàng)目四\ratings.dat",sep='::',header=None,names=['UserID','MovieID','Rating','Timestamp'],engine='python',encoding='iso-8859-15')users=pd.read_table(r"D:\pydata\項(xiàng)目四\users.dat",sep='::',header=None,names=['UserID','Gender','Age','Occupation','Zip-code'],engine='python',encoding='iso-8859-15')info=pd.merge(ratings,users,on="UserID",how="inner")①info=info[info["Gender"]=="M"]②info=info.groupby("MovieID")["Rating"].mean()③res=pd.merge(movies,info,on="MovieID")④res=res.sort_values(by="Rating",ascending=False)⑤res=res.round({"Rating":2})res.head(10)程序代碼
MovieIDTitleGenres01ToyStory(1995)Animation|Children's|Comedy12Jumanji(1995)Adventure|Children's|Fantasy23GrumpierOldMen(1995)Comedy|Romance34WaitingtoExhale(1995)Comedy|Drama45FatheroftheBridePartII(1995)Comedy............38783948MeettheParents(2000)Comedy38793949RequiemforaDream(2000)Drama38803950Tigerland(2000)Drama38813951TwoFamilyHouse(2000)Drama38823952Contender,The(2000)Drama|ThrillerMovies表中的數(shù)據(jù)程序代碼importpandasaspdmovies=pd.read_table(r"D:\pydata\項(xiàng)目四\movies.dat",sep='::',header=None,names=['MovieID','Title','Genres'],engine='python',encoding='iso-8859-15')ratings=pd.read_table(r"D:\pydata\項(xiàng)目四\ratings.dat",sep='::',header=None,names=['UserID','MovieID','Rating','Timestamp'],engine='python',encoding='iso-8859-15')users=pd.read_table(r"D:\pydata\項(xiàng)目四\users.dat",sep='::',header=None,names=['UserID','Gender','Age','Occupation','Zip-code'],engine='python',encoding='iso-8859-15')info=pd.merge(ratings,users,on="UserID",how="inner")info=info[info["Gender"]=="M"]info=info.groupby("MovieID")["Rating"].mean()res=pd.merge(movies,info,on="MovieID")res=res.sort_values(by="Rating",ascending=False)res=res.round({"Rating":2})res.head(10)程序代碼UserIDMovieIDRatingTimestamp011193597830076011661397830210921914397830196831340849783002754123555978824291...............1000204604010911956716541100020560401094595670488710002066040562595670474610002076040109649567156481000208604010974956715569ratings表中的數(shù)據(jù)程序代碼importpandasaspdmovies=pd.read_table(r"D:\pydata\項(xiàng)目四\movies.dat",sep='::',header=None,names=['MovieID','Title','Genres'],engine='python',encoding='iso-8859-15')ratings=pd.read_table(r"D:\pydata\項(xiàng)目四\ratings.dat",sep='::',header=None,names=['UserID','MovieID','Rating','Timestamp'],engine='python',encoding='iso-8859-15')users=pd.read_table(r"D:\pydata\項(xiàng)目四\users.dat",sep='::',header=None,names=['UserID','Gender','Age','Occupation','Zip-code'],engine='python',encoding='iso-8859-15')info=pd.merge(ratings,users,on="UserID",how="inner")info=info[info["Gender"]=="M"]info=info.groupby("MovieID")["Rating"].mean()res=pd.merge(movies,info,on="MovieID")res=res.sort_values(by="Rating",ascending=False)res=res.round({"Rating":2})res.head(10)程序代碼UserIDGenderAgeOccupationZip-code01F1104806712M56167007223M25155511734M4570246045M252055455..................60356036F25153260360366037F4517600660376038F5611470660386039F4500106060396040M25611106
users表中的數(shù)據(jù)程序代碼importpandasaspdmovies=pd.read_table(r"D:\pydata\項(xiàng)目四\movies.dat",sep='::',header=None,names=['MovieID','Title','Genres'],engine='python',encoding='iso-8859-15')ratings=pd.read_table(r"D:\pydata\項(xiàng)目四\ratings.dat",sep='::',header=None,names=['UserID','MovieID','Rating','Timestamp'],engine='python',encoding='iso-8859-15')users=pd.read_table(r"D:\pydata\項(xiàng)目四\users.dat",sep='::',header=None,names=['UserID','Gender','Age','Occupation','Zip-code'],engine='python',encoding='iso-8859-15')info=pd.merge(ratings,users,on="UserID",how="inner")info=info[info["Gender"]=="M"]info=info.groupby("MovieID")["Rating"].mean()res=pd.merge(movies,info,on="MovieID")res=res.sort_values(by="Rating",ascending=False)res=res.round({"Rating":2})res.head(10)程序代碼UserIDMovieIDRatingTimestampGenderAgeOccupationZip-code111935978300760F1104806716613978302109F1104806719143978301968F11048067134084978300275F11048067123555978824291F11048067........................604010911956716541M25611106604010945956704887M2561110660405625956704746M25611106604010964956715648M25611106604010974956715569M25611106info=pd.merge(ratings,users,on="UserID",how="inner")評(píng)分表和用戶(hù)表合并后的數(shù)據(jù)框程序代碼importpandasaspdmovies=pd.read_table(r"D:\pydata\項(xiàng)目四\movies.dat",sep='::',header=None,names=['MovieID','Title','Genres'],engine='python',encoding='iso-8859-15')ratings=pd.read_table(r"D:\pydata\項(xiàng)目四\ratings.dat",sep='::',header=None,names=['UserID','MovieID','Rating','Timestamp'],engine='python',encoding='iso-8859-15')users=pd.read_table(r"D:\pydata\項(xiàng)目四\users.dat",sep='::',header=None,names=['UserID','Gender','Age','Occupation','Zip-code'],engine='python',encoding='iso-8859-15')info=pd.merge(ratings,users,on="UserID",how="inner")info=info[info["Gender"]=="M"]info=info.groupby("MovieID")["Rating"].mean()res=pd.merge(movies,info,on="MovieID")res=res.sort_values(by="Rating",ascending=False)res=res.round({"Rating":2})res.head(10)程序代碼UserIDMovieIDRatingTimestampGenderAgeOccupationZip-code213575978298709M561670072230684978299000M561670072215374978299620M56167007226473978299351M561670072221944978299297M561670072........................604010911956716541M25611106604010945956704887M2561110660405625956704746M25611106604010964956715648M25611106604010974956715569M25611106info=info[info["Gender"]=="M"]篩選出所有男性用戶(hù)后的表程序代碼importpandasaspdmovies=pd.read_table(r"D:\pydata\項(xiàng)目四\movies.dat",sep='::',header=None,names=['MovieID','Title','Genres'],engine='python',encoding='iso-8859-15')ratings=pd.read_table(r"D:\pydata\項(xiàng)目四\ratings.dat",sep='::',header=None,names=['UserID','MovieID','Rating','Timestamp'],engine='python',encoding='iso-8859-15')users=pd.read_table(r"D:\pydata\項(xiàng)目四\users.dat",sep='::',header=None,names=['UserID','Gender','Age','Occupation','Zip-code'],engine='python',encoding='iso-8859-15')info=pd.merge(ratings,users,on="UserID",how="inner")info=info[info["Gender"]=="M"]info=info.groupby("MovieID")["Rating"].mean()res=pd.merge(movies,info,on="MovieID")res=res.sort_values(by="Rating",ascending=False)res=res.round({"Rating":2})res.head(10)程序代碼MovieID
14.13055223.17523832.99415242.48235352.888298...…39483.6418383949468181839514.04347839523.787986男性用戶(hù)對(duì)各個(gè)電影的評(píng)分平均值info=info.groupby("MovieID")["Rating"].mean()程序代碼importpandasaspdmovies=pd.read_table(r"D:\pydata\項(xiàng)目四\movies.dat",sep='::',header=None,names=['MovieID','Title','Genres'],engine='python',encoding='iso-8859-15')ratings=pd.read_table(r"D:\pydata\項(xiàng)目四\ratings.dat",sep='::',header=None,names=['UserID','MovieID','Rating','Timestamp'],engine='python',encoding='iso-8859-15')users=pd.read_table(r"D:\pydata\項(xiàng)目四\users.dat",sep='::',header=None,names=['UserID','Gender','Age','Occupation','Zip-code'],engine='python',encoding='iso-8859-15')info=pd.merge(ratings,users,on="UserID",how="inner")info=info[info["Gender"]=="M"]info=info.groupby("MovieID")["Rating"].mean()res=pd.merge(movies,info,on="MovieID")res=res.sort_values(by="Rating",ascending=False)res=res.round({"Rating":2})res.head(10)程序代碼MovieIDTitleGenresRating1ToyStory(1995)Animation|Children's|Comedy4.1305522Jumanji(1995)Adventure|Children's|Fantasy3.1752383GrumpierOldMen(1995)Comedy|Romance2.9941524WaitingtoExhale(1995)Comedy|Drama2.4823535FatheroftheBridePartII(1995)Comedy2.888298............3948MeettheParents(2000)Comedy3.6418383949RequiemforaDream(2000)Drama4.1741073950Tigerland(2000)Drama3.6818183951TwoFamilyHouse(2000)Drama4.0434783952Contender,The(2000)Drama|Thriller3.787986各個(gè)電影的男性用戶(hù)評(píng)分均值res=pd.merge(movies,info,on="MovieID")程序代碼importpandasaspdmovies=pd.read_table(r"D:\pydata\項(xiàng)目四\movies.dat",sep='::',header=None,names=['MovieID','Title','Genres'],engine='python',encoding='iso-8859-15')ratings=pd.read_table(r"D:\pydata\項(xiàng)目四\ratings.dat",sep='::',header=None,names=['UserID','MovieID','Rating','Timestamp'],engine='python',encoding='iso-8859-15')users=pd.read_table(r"D:\pydata\項(xiàng)目四\users.dat",sep='::',header=None,names=['UserID','Gender','Age','Occupation','Zip-code'],engine='python',encoding='iso-8859-15')info=pd.merge(ratings,users,on="UserID",how="inner")info=info[info["Gender"]=="M"]info=info.groupby("MovieID")["Rating"].mean()res=pd.merge(movies,info,on="MovieID")res=res.sort_values(by="Rating",ascending=False)res=res.round({"Rating":2})res.head(10)程序代碼MovieIDTitleGenresRating787GateofHeavenlyPeace,The(1995)Documentary5.0985SmallWonders(1996)Documentary5.03233SmashingTime(1967)Comedy5.03280Baby,The(1973)Horror5.03172Ulysses(Ulisse)(1954)Adventure5.0............3460HillbillysinaHauntedHouse(1967)Comedy1.0834PhatBeach(1996)Comedy1.03136JamesDeanStory,The(1957)Documentary1.03904UninvitedGuest,An(2000)Drama1.0684Windows(1980)Drama1.0排序后的數(shù)據(jù)表res=res.sort_values(by="Rating",ascending=False)程序代碼importpandasaspdmovies=pd.read_table(r"D:\pydata\項(xiàng)目四\movies.dat",sep='::',header=None,names=['MovieID','Title','Genres'],engine='python',encoding='iso-8859-15')ratings=pd.read_table(r"D:\pydata\項(xiàng)目四\ratings.dat",sep='::',header=None,names=['UserID','MovieID','Rating','Timestamp'],engine='python',encoding='iso-8859-15')users=pd.read_table(r"D:\pydata\項(xiàng)目四\users.dat",sep='::',header=None,names=['UserID','Gender','Age','Occupation','Zip-code'],engine='python',encoding='iso-8859-15')info=pd.merge(ratings,users,on="UserID",how="inner")info=info[info["Gender"]=="M"]info=info.groupby("MovieID")["Rating"].mean()res=pd.merge(movies,info,on="MovieID")res=res.sort_values(by="Rating",ascending=False)res=res.round({"Rating":2})res.head(10)程序代碼MovieIDTitleGenresRating787GateofHeavenlyPeace,The(1995)Documentary5.00985SmallWonders(1996)Documentary5.003233SmashingTime(1967)Comedy5.003280Baby,The(1973)Horror5.003172Ulysses(Ulisse)(1954)Adventure5.00439DangerousGame(1993)Drama5.00130Angela(1995)Drama5.003656Lured(1947)Crime5.001830FollowtheBitch(1998)Comedy5.00989SchlafesBruder(BrotherofSleep)(1995)Drama5.003517Bells,The(1926)Crime|Drama5.002931TimeoftheGypsies(Domzavesanje)(1989)Drama4.833245IAmCuba(SoyCuba/YaKuba)(1964)Drama4.75598WindowtoParis(1994)Comedy4.6753Lamerica(1994)Drama4.67res=res.round({"Rating":2})程序代碼importpandasaspdmovies=pd.read_table(r"D:\pydata\項(xiàng)目四\movies.dat",sep='::',header=None,names=['MovieID','Title','Genres'],engine='python',encoding='iso-8859-15')ratings=pd.read_table(r"D:\pydata\項(xiàng)目四\ratings.dat",sep='::',header=None,names=['UserID','MovieID','Rating','Timestamp'],engine='python',encoding='iso-8859-15')users=pd.read_table(r"D:\pydata\項(xiàng)目四\users.dat",sep='::',header=None,names=['UserID','Gender','Age','Occupation','Zip-code'],engine='python',encoding='iso-8859-15')info=pd.merge(ratings,users,on="UserID",how="inner")info=info[info["Gender"]=="M"]info=info.groupby("MovieID")["Rating"].mean()res=pd.merge(movies,info,on="MovieID")res=res.sort_values(by="Rating",ascending=False)res=res.round({"Rating":2})res.head(10)程序代碼MovieIDTitleGenresRating787GateofHeavenlyPeace,The(1995)Documentary5.0985SmallWonders(1996)Documentary5.03233SmashingTime(1967)Comedy5.03280Baby,The(1973)Horror5.03172Ulysses(Ulisse)(1954)Adventure5.0439DangerousGame(1993)Drama5.0130Angela(1995)Drama5.03656Lured(1947)Crime5.01830FollowtheBitch(1998)Comedy5.0989SchlafesBruder(BrotherofSleep)(1995)Drama5.0res.head(10)任務(wù)小結(jié)merge()函數(shù)通過(guò)列或索引將兩個(gè)數(shù)據(jù)框相關(guān)的數(shù)據(jù)行合并成一行,構(gòu)成一個(gè)新的數(shù)據(jù)框。為了提供更為靈活的操作來(lái)滿足實(shí)際工作的需要。一展身手請(qǐng)根據(jù)“movies”(電影信息)表,“users”(用戶(hù)信息)表,“ratings”(評(píng)分)表,求出女性最不喜歡的10部電影。結(jié)果如圖所示。MovieIDTitleGenresRating3695ToxicAvengerPartIII:TheLastTemptationof...Comedy|Horror1.075BigBully(1996)Comedy|Drama1.01439MeetWallySparks(1997)Comedy1.02207JamaicaInn(1939)Drama1.02256Parasite(1982)Horror|Sci-Fi1.03899Circus(2000)Comedy1.03027Slaughterhouse2(1988)Horror1.03592TimeMasters(LesMa?tresduTemps)(1982)Animation|Sci-Fi1.03574Carnosaur3:PrimalSpecies(1996)Horror|Sci-Fi1.02039Cheetah(1989)Adventure|Children's1.0制作團(tuán)隊(duì)制作:劉學(xué)重慶市九龍坡職業(yè)教育中心統(tǒng)計(jì)各競(jìng)賽項(xiàng)目的人數(shù)主講人:劉學(xué)重慶市九龍坡職業(yè)教育中心問(wèn)題描述學(xué)校技能大賽啟動(dòng)后,老師收到了各個(gè)班級(jí)的技能大賽報(bào)名表,怎樣快速地統(tǒng)計(jì)出各個(gè)項(xiàng)目的參數(shù)人數(shù)呢?老師收到的報(bào)名表文件如圖所示。輸出結(jié)果
比賽項(xiàng)目人數(shù)2019C程序設(shè)計(jì)522019VF數(shù)據(jù)庫(kù)482020C程序設(shè)計(jì)442020VF數(shù)據(jù)庫(kù)41三維動(dòng)畫(huà)3二維動(dòng)畫(huà)制作18二維動(dòng)畫(huà)制作(2021級(jí))129圖像處理(2021級(jí))168圖文混排147幻燈片制作129表格處理113視頻剪輯(2021級(jí))129問(wèn)題分析問(wèn)題描述問(wèn)題解答怎樣將多個(gè)文件的數(shù)據(jù)讀入到一個(gè)數(shù)據(jù)框?
數(shù)據(jù)要以什么為依據(jù)來(lái)分組然后計(jì)算出各個(gè)項(xiàng)目的參賽人數(shù)?依次追加“比賽項(xiàng)目”操作提示首先是讀取班級(jí)表文件夾下面的第一個(gè)數(shù)據(jù)表,然后把其他的表追加到它的后面,最后通過(guò)gro
溫馨提示
- 1. 本站所有資源如無(wú)特殊說(shuō)明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶(hù)所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁(yè)內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒(méi)有圖紙預(yù)覽就沒(méi)有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫(kù)網(wǎng)僅提供信息存儲(chǔ)空間,僅對(duì)用戶(hù)上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶(hù)上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
- 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶(hù)因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。
最新文檔
- 流程管理和流程優(yōu)化培訓(xùn)
- 2025年消費(fèi)者權(quán)益保護(hù)年報(bào)-
- 活動(dòng)策劃培訓(xùn)內(nèi)容
- 2024-2025學(xué)年江西省萍鄉(xiāng)市高一下學(xué)期期末考試歷史試題(解析版)
- 2026年電子商務(wù)運(yùn)營(yíng)師考試題庫(kù)及答案詳解
- 2026年文化傳承與創(chuàng)新文化傳播專(zhuān)業(yè)考試題
- 2026年環(huán)境法律法規(guī)知識(shí)測(cè)試題
- 2026年工程項(xiàng)目成本控制與設(shè)計(jì)策略討論課題測(cè)試題
- 2026年物流專(zhuān)員貨物運(yùn)輸與倉(cāng)儲(chǔ)管理效率測(cè)試
- 2026年生物醫(yī)藥類(lèi)專(zhuān)業(yè)考研試題與答案詳解
- 別克英朗說(shuō)明書(shū)
- 地下管線測(cè)繪課件
- 珍稀植物移栽方案
- 新人教版數(shù)學(xué)三年級(jí)下冊(cè)預(yù)習(xí)學(xué)案(全冊(cè))
- JJG 810-1993波長(zhǎng)色散X射線熒光光譜儀
- GB/T 34336-2017納米孔氣凝膠復(fù)合絕熱制品
- GB/T 20077-2006一次性托盤(pán)
- GB/T 1335.3-2009服裝號(hào)型兒童
- GB/T 10046-2008銀釬料
- GA 801-2019機(jī)動(dòng)車(chē)查驗(yàn)工作規(guī)程
- 灌注樁后注漿工藝.-演示文稿課件
評(píng)論
0/150
提交評(píng)論