網站爬蟲可以用Mathematica。
範例 高中7000單字,如果背的很煩,來玩單字.

從網站爬蟲。下載單字表 ,http://www.kmsh.tn.edu.tw/~edu92/kmedu100/sub6-7.htm



程式碼如下
ClearAll["Global`*"]
data = Import[
"http://www.kmsh.tn.edu.tw/~edu92/kmedu100/senior_7000.xls"];
sheetNeame =
Import["http://www.kmsh.tn.edu.tw/~edu92/kmedu100/senior_7000.xls",
"Sheets"];
join = MapThread[Thread[List[#1, #2]] &, {data, sheetNeame}];
flat = Flatten[join, 1];
aligentment = StringCases[#[[1]],
word__ ~~ "@(" ~~ part__ ~~ "." ~~ DigitCharacter ... ~~ ")" ~~
tran__ -> Sequence[word, part, tran, #[[2]]]][[1]] & /@ flat;
vocabulary7000 = aligentment[[All, 1]] // DeleteDuplicates;
Select[vocabulary7000, StringMatchQ[#, _ ~~ "oo" ~~ _] &]
Select[vocabulary7000, StringMatchQ[#, _ ~~ "ee" ~~ _] &]
Select[vocabulary7000, StringMatchQ[#, __ ~~ "eal"] &]
Select[vocabulary7000, # === StringReverse[#] &]