English 中文(简体)
Order nodes by most images?
原标题:

This might sound a bit complicated, but what I want to do is find all <a>s that contain <img>s such that the images that are in the same node with the greatest number of other images are chosen first.

For example, if my page looks like this:

http://img684.imageshack.us/img684/5678/imagechart.gif

If the blue squares are <div>s and the pink squares are <img>s then the middle div contains the most images, then those images are chosen first. Since they aren t nested any deeper than that, they are just appear in the order that they are on the page. Next the first div is chosen (contains the 2nd most images), and so forth... does that make sense?

We can think of it sort of recursively. First the body would be chosen since that will always contain the most images, then each of the direct children are examined to see which contains the most image descendants (not necessarily direct), then we go into that node, and repeat...

最佳回答

Current solution:

    private static int Count(HtmlNodeCollection nc) {
        return nc == null ? 0 : nc.Count;
    }

    private static void BuildList(HtmlNode node, ref List<HtmlNode> list) {
        var sortedNodes = from n in node.ChildNodes
                          orderby Count(n.SelectNodes(".//a[@href and img]")) descending
                          select n;
        foreach (var n in sortedNodes) {
            if (n.Name == "a") list.Add(n);
            else if (n.HasChildNodes) BuildList(n, ref list);
        }
    }

Example usage:

    private static void ProcessDocument(HtmlDocument doc, Uri baseUri) {
        var linkNodes = new List<HtmlNode>(100);
        BuildList(doc.DocumentNode, ref linkNodes);
        // ...

It s a bit inefficient though because it does a lot of recounting, but oh well.

问题回答

You could try looking at the count of images for every node.

    public static XmlNode FindNodeWithMostImages(XmlNodeList

nodes) {

        var greatestImageCount = 0;
        XmlNode nodeWithMostImages = null;

        foreach (XmlNode node in nodes)
        {
            var currentNode = node;
            var currentNodeImageCount = node.SelectNodes("*/child::img").Count;

            if (currentNodeImageCount > greatestImageCount)
            {
                greatestImageCount = currentNodeImageCount;
                nodeWithMostImages = node;
            }
        }

        return nodeWithMostImages;
    }

XPATH 1.0 does not provide the ability to sort a collection. You will need to leverage XPATH with something else.

Here is an example XSLT solution that will find all elements that contain descendant <img> elements, and then sorts them by the count of their descendant <img> elements in descending order.

    <xsl:template match="/">
        <!--if only want <a>, then select //a[descendant::img] -->
        <xsl:for-each select="//*[descendant::img]">
            <xsl:sort select="count(descendant::img)" order="descending" />

                <!--Example output to demonstrate what elements have been selected-->
                <xsl:value-of select="name()"/><xsl:text> has </xsl:text>
                <xsl:value-of select="count(.//img)" />  
                <xsl:text> descendant images                     
                </xsl:text>

        </xsl:for-each>

    </xsl:template>

</xsl:stylesheet>

I wasn t clear from your question and examples whether you want to find any element with descendant <img> or just <a> with descendant <img>.

If you wanted to just find <a> elements with descendant <img> elements, then adjust the XPATH in the for-each to select: //a[descendant::img]





相关问题
Anyone feel like passing it forward?

I m the only developer in my company, and am getting along well as an autodidact, but I know I m missing out on the education one gets from working with and having code reviewed by more senior devs. ...

NSArray s, Primitive types and Boxing Oh My!

I m pretty new to the Objective-C world and I have a long history with .net/C# so naturally I m inclined to use my C# wits. Now here s the question: I feel really inclined to create some type of ...

C# Marshal / Pinvoke CBitmap?

I cannot figure out how to marshal a C++ CBitmap to a C# Bitmap or Image class. My import looks like this: [DllImport(@"test.dll", CharSet = CharSet.Unicode)] public static extern IntPtr ...

How to Use Ghostscript DLL to convert PDF to PDF/A

How to user GhostScript DLL to convert PDF to PDF/A. I know I kind of have to call the exported function of gsdll32.dll whose name is gsapi_init_with_args, but how do i pass the right arguments? BTW, ...

Linqy no matchy

Maybe it s something I m doing wrong. I m just learning Linq because I m bored. And so far so good. I made a little program and it basically just outputs all matches (foreach) into a label control. ...

热门标签