English 中文(简体)
州立州立州立州立州立州立州立州立州立州立州立州立州立州立州立州立州立州立州立州立州立州立州立州立州立州立州立州立州立州立州立州立州立州立州立州立州立州立州立州立州立州立州立州立州立州立州立州立州立州立州立州立州立州立州立州立州立州立州立州立州立州
原标题:A stategy for parsing favicon locations?
  • 时间:2012-05-23 17:44:40
  •  标签:
  • php

< 坚固 > 链接到 < /强 >

No link...Use default location.
http://www.linkedin.com/favicon.ico

< 强度 > Twitter

  <link href="/phoenix/favicon.ico" rel="shortcut icon" type="image/x-icon" />

<强 > 兴趣

<link rel="icon" href="http://passets-cdn.pinterest.com/images/favicon.png" type="image/x-icon" />

< 强势 > 脸书

<link rel="shortcut icon" href="https://s-static.ak.facebook.com/rsrc.php/yi/r/q9U99v3_saj.ico" />

我确定找到猎鹰的唯一百分百的方法 就是检查源头 看看链接在哪里

  • Default location is not always used. Note first 2 examples.
  • Google API works only about 85% of the time. Try It Out

是否有一个函数可以解析此信息? 或者是否有一个很好的策略 来使用正正方体来手动提取它 。

我将解析 html 服务器的侧面 以获得这个信息 。

<强 > 想法:

Regex 示例 : < a href=" "http://regexpal.com/" rel=" no follow" >Try here .似乎很容易...但这里是一个起点。

<linksrel="[Ss]hortcut [Ii]con"shref="(.+)"(.+)>
最佳回答

使用解析器 :

$dom = new DOMDocument();
@$dom->loadHTML($input);
$links = $dom->getElementsByTagName( link );
$l = $links->length;
$favicon = "/favicon.ico";
for( $i=0; $i<$l; $i++) {
    $item = $links->item($i);
    if( strcasecmp($item->getAttribute("rel"),"shortcut icon") === 0) {
        $favicon = $item->getAttribute("href");
        break;
    }
}
// You now have your $favicon
问题回答

PHP 5 DOMDocument: 原始正反

到目前为止,这适用于所有案件。

    $pattern =  #<links+(?=[^>]*rel="(?:shortcuts)?icon"s+)(?:[^>]*href="(.+?)").*>#i ;      

你们必须围绕几个问题开展工作,比如网站重定向和各种警告。这是我为了收获90%的网站信息而做的:

<?
/*
  nws-favicon : Get site s favicon using various strategies

  This script is part of NWS
  https://github.com/xaccrocheur/nws/

*/


function CheckImageExists($imgUrl) {
    if (@GetImageSize($imgUrl)) {
        return true;
    } else {
        return false;
    };
};

function getFavicon ($url) {

$fallback_favicon = "/var/www/favicon.ico";

    $dom = new DOMDocument();
    @$dom->loadHTML($url);
    $links = $dom->getElementsByTagName( link );
    $l = $links->length;
    $favicon = "/favicon.ico";
    for( $i=0; $i<$l; $i++) {
        $item = $links->item($i);
        if( strcasecmp($item->getAttribute("rel"),"shortcut icon") === 0) {
            $favicon = $item->getAttribute("href");
            break;
        }
    }

    $u = parse_url($url);

    $subs = explode(  . , $u[ host ]);
    $domain = $subs[count($subs) -2]. . .$subs[count($subs) -1];

    $file = "http://".$domain."/favicon.ico";
    $file_headers = @get_headers($file);

    if($file_headers[0] ==  HTTP/1.1 404 Not Found  || $file_headers[0] ==  HTTP/1.1 404 NOT FOUND  || $file_headers[0] ==  HTTP/1.1 301 Moved Permanently ) {

        $fileContent = @file_get_contents("http://".$domain);

        $dom = @DOMDocument::loadHTML($fileContent);
        $xpath = new DOMXpath($dom);

        $elements = $xpath->query("head/link//@href");

        $hrefs = array();

        foreach ($elements as $link) {
            $hrefs[] = $link->value;
        }

        $found_favicon = array();
        foreach ( $hrefs as $key => $value ) {
            if( substr_count($value,  favicon.ico ) > 0 ) {
                $found_favicon[] = $value;
                $icon_key = $key;
            }
        }

        $found_http = array();
        foreach ( $found_favicon as $key => $value ) {
            if( substr_count($value,  http ) > 0 ) {
                $found_http[] = $value;
                $favicon = $hrefs[$icon_key];
                $method = "xpath";
            } else {
                $favicon = $domain.$hrefs[$icon_key];
                if (substr($favicon, 0, 4) !=  http ) {
                    $favicon =  http://  . $favicon;
                    $method = "xpath+http";
                }
            }
        }

        if (isset($favicon)) {
            if (!CheckImageExists($favicon)) {
                $favicon = $fallback_favicon;
                $method = "fallback";
            }
        } else {
            $favicon = $fallback_favicon;
            $method = "fallback";
        }

    } else {
        $favicon = $file;
        $method = "classic";

        if (!CheckImageExists($file)) {
            $favicon = $fallback_favicon;
            $method = "fallback";
        }

    }
    return $favicon;
}

?>




相关问题
Brute-force/DoS prevention in PHP [closed]

I am trying to write a script to prevent brute-force login attempts in a website I m building. The logic goes something like this: User sends login information. Check if username and password is ...

please can anyone check this while loop and if condition

<?php $con=mysql_connect("localhost","mts","mts"); if(!con) { die( unable to connect . mysql_error()); } mysql_select_db("mts",$con); /* date_default_timezone_set ("Asia/Calcutta"); $date = ...

定值美元

如何确认来自正确来源的数字。

Generating a drop down list of timezones with PHP

Most sites need some way to show the dates on the site in the users preferred timezone. Below are two lists that I found and then one method using the built in PHP DateTime class in PHP 5. I need ...

Text as watermarking in PHP

I want to create text as a watermark for an image. the water mark should have the following properties front: Impact color: white opacity: 31% Font style: regular, bold Bevel and Emboss size: 30 ...

How does php cast boolean variables?

How does php cast boolean variables? I was trying to save a boolean value to an array: $result["Users"]["is_login"] = true; but when I use debug the is_login value is blank. and when I do ...

热门标签