我有<代码>df.summary(>>的DF。
+-------+-----------+-----------+
|summary| col1 | col2 |
+-------+-----------+-----------+
| count| 1000| 1000|
| mean| 45.678923| 67.890123|
| stddev| 7.123456| 9.234567|
| min| 32.45| 54.23|
| 25%| 40.12| 63.45|
| 50%| 45.67| 67.89|
| 75%| 50.23| 72.34|
| max| 58.90| 87.65|
+-------+-----------+-----------+
虽然如此,我还是有一个具有某些习俗衡量标准的二级国防军:
distinct_col1, complete_col1, distinct_col2, complete_col2, distinct_col3, complete_col3, max_col3, min_col3
989, 1000, 1000, 1000, 540, 1000, ‘2023-10-01’, ‘2021-01-01’
我怀疑,我如何能够加入两个国防军,以取得以下产出:
+-------+-----------+-----------+-----------+
|summary| col1 | col2 | col3 |
+-------+-----------+-----------+-----------+
| count| 1000| 1000| 1000|
| dstnct| 989| 1000| 540|
| complt| 1000| 1000| 1000|
| mean| 45.678923| 67.890123| |
| stddev| 7.123456| 9.234567| |
| min| 32.45| 54.23|2021-01-01 |
| 25%| 40.12| 63.45| |
| 50%| 45.67| 67.89| |
| 75%| 50.23| 72.34| |
| max| 58.90| 87.65|2023-10-01 |
+-------+-----------+-----------+-----------+
I’ve tried some queries using spark.sql
with DESCRIBE
but with no success.