English 中文(简体)
2. 探测不符合标识功能的离层
原标题:Detecting outlier points that do not fit logarithm function
  • 时间:2023-12-29 16:50:43
  •  标签:
  • r
  • outliers

I have a data set that looks like this: enter image description here

我的目标是,与各个要点相匹配,但忽略其余要点,同时对其余各点采取对应办法。 基本上,我需要一种算法,以便在各点不遵循现行模式时能够发现。

我尝试在行程的每一点计算衍生物,并配上一条线,然后计算残余物。 这两者都没有能够确定哪些要点是外在的。

引生数据法:

structure(list(x = c(0.203511, 0.18356, 0.230625, 0.183559, 
0.183559, 0.18355, 0.183558, 0.183549, 0.183549, 0.252141, 0.206343, 
0.183551, 0.183559, 0.206392, 0.208928, 0.183547, 0.206628, 0.206861, 
0.209157, 0.18356, 0.18356, 0.183549, 0.322667, 0.18356, 0.183551, 
0.208688, 0.183549, 0.183548, 0.183548, 0.183559, 0.199, 0.183559, 
0.237485, 0.27509, 0.326279, 0.23, 0.367, 0.230508, 0.253799, 
0.196695, 0.183553, 0.265886, 0.244, 0.183552, 0.23, 0.322667, 
0.188143, 0.222, 0.205693, 0.245927, 0.183553, 0.333, 0.189656, 
0.18554, 0.183552, 0.367, 0.183553, 0.194093, 0.191298, 0.181536, 
0.183689, 0.192226, 0.204093, 0.269, 0.23291, 0.203545, 0.197, 
0.239, 0.233, 0.222, 0.185717, 0.191805, 0.26948, 0.24, 0.195, 
0.226951, 0.284, 0.235932, 0.195922, 0.184, 0.195935, 0.210763, 
0.197096, 0.197493, 0.21671, 0.195947, 0.21039, 0.308194, 0.212768, 
0.222379, 0.20945, 0.227, 0.219721, 0.191805, 0.268352, 0.323481, 
0.305, 0.2, 0.186, 0.811116, 0.23, 0.207, 0.236418, 0.243, 0.23, 
0.253, 0.195, 0.193, 0.395364, 0.314, 0.231, 0.195, 0.403, 0.243, 
0.24, 0.197246, 0.202, 0.217, 0.323877, 0.256, 0.193, 0.222656, 
0.224, 0.271, 0.292221, 0.185, 0.23, 0.273, 0.212149, 0.203, 
0.192, 0.23, 0.252, 0.19, 0.191805, 0.248, 0.183552, 0.184, 0.209217, 
0.185, 0.254, 0.199, 0.204, 0.183651, 0.201, 0.22, 0.255, 0.213, 
0.183553, 0.255291, 0.295301, 0.284, 0.23, 0.208, 0.286, 0.48, 
0.206, 0.191679, 0.23, 0.184, 0.195357, 0.184, 0.25815, 0.261, 
0.230229, 0.184, 0.253, 0.24875, 0.239, 0.242, 0.364462, 0.183925, 
0.217, 0.248, 0.245, 0.218719, 0.273, 0.19, 0.221632, 0.259, 
0.196, 0.212, 0.198, 0.249263, 0.322493, 0.306, 0.316, 0.66, 
0.25, 0.269, 0.231, 0.23, 0.23, 0.184, 0.243, 0.253259, 0.313, 
0.19, 0.227, 0.184, 0.183648, 0.239, 0.245757, 0.203, 0.212, 
0.19, 0.278765, 0.211, 0.192, 0.294992, 0.256297, 0.23, 0.188, 
0.26, 0.267956, 0.23, 0.256, 0.241, 0.206, 0.23, 0.23, 0.242, 
0.264573, 0.330066, 0.198, 0.219, 0.23, 0.329, 0.252, 0.273, 
0.23, 0.23, 0.23, 0.288243, 0.34749, 0.23, 0.201, 0.218, 0.253889, 
0.183554, 0.183552, 0.231, 0.214, 0.206483, 0.323926, 0.323, 
0.206192, 0.282, 0.208712, 0.249, 0.185637, 0.184, 0.245, 0.352793, 
0.239186, 0.184, 0.251484, 0.184, 0.489818, 0.219, 0.228, 0.196, 
0.343198, 0.203, 0.252, 0.253, 0.322143, 0.432, 0.241, 0.191, 
0.202, 0.20648, 0.23458, 0.339763, 0.207, 0.23, 0.225, 0.212, 
0.231, 0.243, 0.23, 0.3, 0.398848, 0.184, 0.470245, 0.322318, 
0.272, 0.23, 0.234, 0.284, 0.23, 0.243, 0.253, 0.287, 0.2075, 
0.201083, 0.26, 0.329, 0.23, 0.207, 0.237, 0.241, 0.273, 0.23, 
0.252, 0.23, 0.243, 0.24, 0.312884, 0.259, 0.288, 0.260141, 0.242672, 
0.256448, 0.287, 0.190091, 0.204, 0.762655, 0.315, 0.239259, 
0.208514, 0.23906, 0.184, 0.263, 0.254, 0.19, 0.198, 0.183559, 
0.241, 0.213, 0.23, 0.208, 0.284, 0.401, 0.209, 0.23, 0.23, 0.194, 
0.221635, 0.308, 0.23, 0.191804, 0.195, 0.282, 0.205232, 0.276, 
0.231, 0.23, 0.264, 0.234518, 0.193, 0.255, 0.318, 0.338, 0.23, 
0.221891, 0.28995, 0.23, 0.185, 0.252, 0.298, 0.344192, 0.214604, 
0.254, 0.394567, 0.23746, 0.184222, 0.23, 0.23, 0.191, 0.228, 
0.19, 0.232791, 0.184, 0.23, 0.239539, 0.195, 0.184, 0.184, 0.182, 
0.193, 0.184, 0.284, 0.194905, 0.200346, 0.184, 0.319, 0.208, 
0.237, 0.239271, 0.184, 0.320232, 0.413, 0.228, 0.298419, 0.221637, 
0.23, 0.23, 0.281, 0.289, 0.197, 0.198, 0.232, 0.195, 0.23, 0.201738, 
0.183599, 0.207, 0.239541, 0.23, 0.234506, 0.229), y = c(212100, 
207300, 218000, 207300, 207300, 207300, 207300, 207300, 207300, 
222200, 212700, 207300, 207300, 212700, 213200, 207300, 212700, 
212700, 213200, 207300, 207300, 207300, 234700, 207300, 207300, 
213200, 207300, 207300, 207300, 207300, 211000, 207300, 219500, 
226600, 235100, 217800, 241200, 217800, 222500, 210600, 207200, 
224900, 220500, 207200, 217800, 234700, 208300, 216100, 212500, 
221000, 207200, 236400, 208700, 207800, 207200, 241200, 207200, 
209800, 209200, 206800, 207300, 209300, 212200, 225400, 218400, 
212100, 210600, 219700, 218400, 216100, 207800, 209300, 225600, 
219800, 210000, 217300, 228100, 219000, 210200, 207300, 210200, 
213700, 210600, 210700, 214900, 210200, 213600, 232100, 214000, 
216300, 213400, 217300, 215600, 209300, 225400, 234900, 231800, 
211200, 208000, 287400, 217800, 212700, 219100, 220300, 217800, 
222500, 210000, 209500, 245400, 233200, 218000, 210000, 246300, 
220300, 219800, 210700, 211700, 215100, 224600, 223000, 209500, 
216300, 216500, 225800, 229400, 207500, 217800, 226200, 213900, 
211900, 209300, 217800, 222400, 208900, 209300, 221500, 207200, 
207300, 213200, 207500, 222700, 211000, 212200, 207300, 211400, 
215700, 222800, 214000, 207200, 222800, 230100, 228100, 217800, 
213100, 228600, 256100, 212500, 209300, 217800, 207300, 210000, 
207300, 223500, 223900, 217800, 207300, 222500, 221500, 219700, 
220100, 230500, 207300, 215100, 221500, 220800, 215200, 226200, 
208900, 216100, 223500, 210200, 213900, 210700, 221600, 234500, 
231900, 233400, 274700, 221800, 225400, 218000, 217800, 217800, 
207300, 220300, 222500, 233000, 208900, 217300, 207300, 207300, 
219700, 221000, 211900, 213900, 208900, 227300, 213700, 209300, 
230100, 223000, 217800, 208300, 223700, 225200, 217800, 223000, 
219900, 212500, 217800, 217800, 220100, 224700, 235900, 210700, 
215600, 217800, 235700, 222400, 226200, 217800, 217800, 217800, 
228800, 238400, 217800, 211400, 215200, 222500, 207200, 207200, 
218000, 214400, 212700, 224600, 234700, 212500, 227700, 213200, 
221600, 207800, 207300, 220800, 239300, 219700, 207300, 222200, 
207300, 257300, 215600, 217400, 210200, 237900, 211900, 222400, 
222500, 234500, 250200, 219900, 209200, 211700, 212700, 218800, 
237200, 212700, 217800, 216600, 213900, 218000, 220300, 217800, 
230800, 235100, 207300, 254900, 234500, 226000, 217800, 218600, 
228100, 217800, 220300, 222500, 228800, 213100, 211400, 223700, 
235700, 217800, 212700, 219100, 219900, 226200, 217800, 222400, 
217800, 220300, 219800, 233000, 223500, 228900, 223900, 220300, 
223000, 228800, 208900, 212200, 283400, 233300, 219800, 213200, 
219700, 207300, 224200, 222700, 208900, 210700, 207200, 219900, 
214000, 217800, 213100, 228100, 246100, 213200, 217800, 217800, 
209800, 216100, 232100, 217800, 209300, 210000, 227700, 212300, 
226700, 218000, 217800, 224300, 218800, 209500, 222800, 234000, 
237000, 217800, 216100, 229100, 217800, 207500, 222400, 230600, 
227700, 214600, 222700, 234600, 219500, 207300, 217800, 217800, 
209200, 217400, 208900, 218400, 207300, 217800, 219800, 210000, 
207300, 207300, 206800, 209500, 207300, 228100, 210000, 211200, 
207300, 234000, 213100, 219100, 219800, 207300, 234300, 247700, 
217400, 230600, 216100, 217800, 217800, 227500, 229100, 210600, 
210700, 218200, 210000, 217800, 211700, 207300, 212700, 219800, 
217800, 218800, 217600)), class = "data.frame", row.names = c(NA, 
-412L))
最佳回答

在这方面,我们如何能够做到:

library(ggplot2)
library(dplyr)

df %>%
  mutate(residual = resid(lm(y ~ log(x), data = .)),
         outlier = ifelse(abs(residual) > (3 * sd(residual)),  Yes ,  No )) %>% 
  ggplot(aes(x = x, y = y, color = outlier)) +
  geom_point(aes(shape = outlier), size = 5, alpha = 0.5) +
  geom_smooth(data = . %>% filter(outlier == "No"), 
              method = "lm", formula = y ~ log(x), se = FALSE) +
  scale_color_manual(values = c("No" = "steelblue3", "Yes" = "red3")) +
  scale_shape_manual(values = c("No" = 16, "Yes" = 17)) +
  theme_bw() +
  theme(text = element_text(size = 21))

enter image description here

问题回答

暂无回答




相关问题
How to plot fitted model over observed time series

This is a really really simple question to which I seem to be entirely unable to get a solution. I would like to do a scatter plot of an observed time series in R, and over this I want to plot the ...

REvolution for R

since the latest Ubuntu release (karmic koala), I noticed that the internal R package advertises on start-up the REvolution package. It seems to be a library collection for high-performance matrix ...

R - capturing elements of R output into text files

I am trying to run an analysis by invoking R through the command line as follows: R --no-save < SampleProgram.R > SampleProgram.opt For example, consider the simple R program below: mydata =...

R statistical package: wrapping GOFrame objects

I m trying to generate GOFrame objects to generate a gene ontology mapping in R for unsupported organisms (see http://www.bioconductor.org/packages/release/bioc/vignettes/GOstats/inst/doc/...

Changing the order of dodged bars in ggplot2 barplot

I have a dataframe df.all and I m plotting it in a bar plot with ggplot2 using the code below. I d like to make it so that the order of the dodged bars is flipped. That is, so that the bars labeled "...

Strange error when using sparse matrices and glmnet

I m getting a weird error when training a glmnet regression. invalid class "dgCMatrix" object: length(Dimnames[[2]]) must match Dim[2] It only happens occasionally, and perhaps only under larger ...

Generating non-duplicate combination pairs in R

Sorry for the non-descriptive title but I don t know whether there s a word for what I m trying to achieve. Let s assume that I have a list of names of different classes like c( 1 , 2 , 3 , 4 ) ...

Per panel smoothing in ggplot2

I m plotting a group of curves, using facet in ggplot2. I d like to have a smoother applied to plots where there are enough points to smooth, but not on plots with very few points. In particular I d ...

热门标签