English 中文(简体)
改变从屏幕报废器收集的数据的建议
原标题:advice for transforming data gathered from screen scrapers

良好的民俗,

我有我的过滤器(碎片)收集多个财产网站的财产清单数据。 他们都有几个共同的领域,如价格、地表面积等。 然而,如同所有废弃数据一样,这些领域中的价值现在相当不可取。 例如,price 我有明显的价值观,如<代码> 1 000 000 000 000 000 000美元,但我也像1 000 000 000 000 000美元。 Price on Ask and Price on Ask. 因此,目前,我把所有废田储存在我的数据库中,作为果园。

我想将我数据库中的这些密集领域从特性转变为适当类型,例如直线,因此我可以相应地将其索引。 谁能向我提供什么是明智的程序和方法开始改变数据?

问题回答

You want to throw away the "Price On Ask" string? Or is that valuable information?

如果在数据中有许多噪音,而且这是完全没有兴趣的,那么我就有一个过滤器去除所有非数字。

但是,如果时间允许,我更愿意将数据与模式匹配明确处理(组合代码为PHP):

//$price is raw string
$price=str_replace( , ,  ,$price);    //Get rid of commas
$price=str_replace( $ ,  ,$price);    //Get rid of dollar signs

if($price== Price On Ask )$price=null;
elseif(preg_match( /^d+$/ ,$price))$price=(int)$price;  //Simple number
elseif(preg_match( /^(d+) Price On Ask$/i ,$price,$parts)){
   $price=(int)$parts[1];
   }
else{
   echo "Unexpected price string: $price
";
   $price=null;
   }

然后,我有为一些扼杀装置设定旗帜的结构。 此外,如果数据中出现新的插图,那么我的文字就会听觉,我可以决定是不是。

(说明:将价格定为无效,意味着将国家扫盲十年列入数据库,而不是零)。)





相关问题
what is wrong with this mysql code

$db_user="root"; $db_host="localhost"; $db_password="root"; $db_name = "fayer"; $conn = mysqli_connect($db_host,$db_user,$db_password,$db_name) or die ("couldn t connect to server"); // perform query ...

Users asking for denormalized database

I am in the early stages of developing a database-driven system and the largest part of the system revolves around an inheritance type of relationship. There is a parent entity with about 10 columns ...

Easiest way to deal with sample data in Java web apps?

I m writing a Java web app in my free time to learn more about development. I m using the Stripes framework and eventually intend to use hibernate and MySQL For the moment, whilst creating the pages ...

join across databases with nhibernate

I am trying to join two tables that reside in two different databases. Every time, I try to join I get the following error: An association from the table xxx refers to an unmapped class. If the ...

How can I know if such value exists in database? (ADO.NET)

For example, I have a table, and there is a column named Tags . I want to know if value programming exists in this column. How can I do this in ADO.NET? I did this: OleDbCommand cmd = new ...

Convert date to string upon saving a doctrine record

I m trying to migrate one of my PHP projects to Doctrine. I ve never used it before so there are a few things I don t understand. In my current code, I have a class similar to this: class ...

热门标签