php

[php] curl 模擬網站登入抓資料

davidou 2015 年 04 月 09 日

最近剛好朋友再問我怎模擬網站登入去爬資料出來。

我一直記得我在哪個地方寫過，應該是bbs上吧。反正就在寫一次吧

要用curl去抓網站首先你的php.ini的curl模組要打開才可以不然會出錯。

也就是 “extension=php_curl.dll” 這個有問題可以參考一下這個

接下來就是程式碼部分拉

<?php
$cookie_jar = 'c:/cookie.txt' ;
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'http://www.plurk.com/m/login');
curl_setopt($ch, CURLOPT_POST, 1);
$request = 'username=davidou123&password=0000';
curl_setopt($ch, CURLOPT_POSTFIELDS, $request);
//把返回來的cookie保存在$cookie_jar文件中
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookie_jar);
//設定返回的資料是否自動顯示
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
//設定是否顯示頭訊息
curl_setopt($ch, CURLOPT_HEADER, false);
//設定是否輸出頁面內容
curl_setopt($ch, CURLOPT_NOBODY, false);
curl_exec($ch);
curl_close($ch);
//get data after login
$ch2 = curl_init();
curl_setopt($ch2, CURLOPT_URL, '要爬的網址');
curl_setopt($ch2, CURLOPT_HEADER, false);
curl_setopt($ch2, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch2, CURLOPT_COOKIEFILE, $cookie_jar);
$orders = curl_exec($ch2);
echo '<pre>';
echo strip_tags($orders);
echo '</pre>';
curl_close($ch2);
?>

這邊幾個需要注意的地方就是在程式碼部分的 http://www.plurk.com/m/login 就是你要登入網站的登入頁面網址。

而username=davidou123&password=0000 這邊就是你登入頁面需要post過去的資料有哪些，所以你必須要去看一下對方的程式碼input名稱設定甚麼，到底送甚麼回去給他做登入才能改像有的網站他的帳號欄位有可能不叫做username 可能叫做uname 、 usrname 、uid、login等等之類的。

之後你在[要爬的網址] 這邊打上你要抓的網頁網址在這邊，這樣就可以抓下整個網頁的原始碼了。

當然你如果你只想要網頁的某部分資料而已的話你之後就必須要寫preg_match_all 去做正規表示式分析網頁了

像是這樣

preg_match_all("<a>" ,$orders, $output , PREG_PATTERN_ORDER );

下面也有轉換工具可以用chrome轉來，比較推薦這個啦。省的自己寫code，人家幫妳寫好好的…

https://incarnate.github.io/curl-to-php/

Be the First to comment.

Davidou的 Blog

[php] curl 模擬網站登入抓資料

Leave a Comment 取消回覆

About Author

DAVIDOU

近期文章

文章分類

最新回應

我的社群網路