English 中文(简体)
Bigquery - Remove Duplicate
原标题:Bigquery - Remove Duplicate using

I have a table where some records may be duplicates.

Table: dashboard.scrappy.imoveis | data | imobiliaria | negocio | codigo | cidade | uf | tipo | bairro | url | endpoint | total | valor | | --------- | ----------- | ------- | ------ | ------ | ------ | ------ | ------ | ------ | -------- | ----- | ----- | | timestamp | string | string | string | string | string | string | string | string | string | float | float |

复制记录在4栏(/data,imobiliaria,negocio ,codigo)中标明。

表中没有中值指数(增值)栏。

我试图使用以下指示:(删除留下1个记录的重复)

DELETE FROM `dashboard.scrappy.imoveis`
WHERE ((ROW_NUMBER() OVER (PARTITION BY `data`, `imobiliaria`, `negocio`, `codigo`) > 1) AND (COUNT(*) > 1 OR (COUNT(*) = 1 AND ROW_NUMBER() OVER (PARTITION BY `data`, `imobiliaria`, `negocio`, `codigo`) = 1)))
GROUP BY `data`, `imobiliaria`, `negocio`, `codigo`;

But it is giving the error: Syntax error: Syntax error: Expected end of input but got keyword GROUP at [3:1]

关于如何确定建议的任何建议?

问题回答

由于我们不能用小组来删除发言,你必须重新解释问题。 在条款中也无法使用总合/窗口功能。

所有栏目中识别重复的最容易的方法是操作。

 Select distinct * from table_name 

which will give you the unique rows. you can store this result temporarily in a different table or replace the existing table:

create or replace table table_name as
select distinct * from table_name.

或您可以发言并删除以下几句:

with cte as
(select *,
ROW_NUMBER() OVER (PARTITION BY `data`, `imobiliaria`, `negocio`, `codigo`) as dup_row from table_name )
delete from cte where dup_row > 1




相关问题
Bigquery table join with specific conditions

I have this table A which contains | ID | Start Date | End Date | |:---- |:----------:| --------:| | 1 | 2020-03-01 |2020-03-02| | | 2020-05-01|2020-05-02| | 2 | 2020-06-01|2020-06-02| ...

SQL IN operator in Bigquery

Suppose I have two tables of devices, my table 1 and table 2. I need to get all the models that has band (in table). Looking for ways on to execute this in sql bigquery. Thanks for the answer Table 2 ...

页: 1

I am trying to split my Date column into 2 separate columns (DATE & TIME). Currently, the date column has the date with a time stamp, and I need to drop the time stamp or put it into another ...

Does Google BigQuery require a schema? [closed]

I want to use bigquery for storing logs. Does it require a fixed schema like Mysql and other RDBMS or it is like nosql where there is no schema?

How to Analyze and Query big chunks of data

I need to: 1. Analyze big files of http logs I m thinking in using mapreduce but I m not sure where to host it. Shall I use App Engine Mapper or EC2+MapReduce or simply use it in my VPS? Other ...

MapReduce in the cloud

Except for Amazon MapReduce, what other options do I have to process a large amount of data?

热门标签