話不多說 附上程式碼
<?php session_start(); header("Content-type: text/html;charset=utf-8"); $contents= file_get_contents("網址"); preg_match_all("/<\/?title>(.*?)<\/?title>/", $contents, $result, PREG_SET_ORDER); preg_match_all("/<\/?pubDate>(.*?)<\/?pubDate>/", $contents, $resultdate, PREG_SET_ORDER); echo $contents; echo "自由時報新聞跑馬燈"; for($i=0;$i&lt;10;$i++) { echo $i+1 .". ".$content[1][$i]; echo ""; echo "www.plurk.com/m/p/".$link[1][$i]; } echo "下一頁".$next[1][0]; ?>
而preg_match_all 其實不匹配换行符(默认情况下),所以如果要匹配到換行符號的話,要多加s
例如
preg_match_all("/<\/?title>(.*?)<\/?title>/", $contents, $result, PREG_SET_ORDER);
就要改成
preg_match_all("/<\/?title>(.*?)<\/?title>/s", $contents, $result, PREG_SET_ORDER);
或是我發現透過dom操作的方式來抓網頁也很爽快
<?php $xml = <<< XML <?xml version="1.0" encoding="utf-8"?> <books> <book>Patterns of Enterprise Application Architecture</book> <book>Design Patterns: Elements of Reusable Software Design</book> <book>Clean Code</book> </books> XML; $dom = new DOMDocument; $dom->loadXML($xml); $books = $dom->getElementsByTagName('book'); foreach ($books as $book) { echo $book->nodeValue, PHP_EOL; } ?>
來源:http://php.net/manual/en/domdocument.getelementsbytagname.php
或是直接
$html = file_get_html('http://localhost/get.php'); $html2 = str_get_html($html); foreach($html2->find('tr') as $element) { $td = array(); foreach( $element->find('th') as $row) { $td [] = $row->plaintext; } print_r($td); $td = array(); foreach( $element->find('td') as $row) { $td [] = $row->plaintext; } print_r($td); }