English 中文(简体)
摘自人民国防军使用人民民主党一揽子计划提出的单一编号表格
原标题:Extract a single, numbered table from PDF using PDE package
  • 时间:2023-10-03 19:05:52
  •  标签:
  • r
The bounty expires in 6 days. Answers to this question are eligible for a +50 reputation bounty. Gopala wants to draw more attention to this question.

我有一份PDF,我正在使用PDE。 它正在发挥作用,但并非是我所希望的方式。

library(PDE)

myTables <- PDE_pdfs2table(pdf =  GPI-2023-Web.pdf )
Following file is processing:  GPI-2023-Web.pdf 
No filter words chosen for analysis.
The following table was detected but not processable for extraction: Table 3.2 shows a breakdown of the change in the e
27 table(s) found in  GPI-2023-Web.pdf .
Analysis of  GPI-2023-Web.pdf  complete.

该表摘录了所有表,并作为单个CSVs排入一个称为tables/code>的分册。

cd tables/
[tables]$ ls
GPI-2023-Web_#010_table1.csv        GPI-2023-Web_#024_table3.csv
GPI-2023-Web_#011_table1.csv        GPI-2023-Web_#025_table1.csv
GPI-2023-Web_#012_table1.csv        GPI-2023-Web_#026_table1.csv
GPI-2023-Web_#013_table3.csv        GPI-2023-Web_#027_table1.csv
GPI-2023-Web_#014_table3.csv        GPI-2023-Web_#02_table1.csv
GPI-2023-Web_#015_table3.csv        GPI-2023-Web_#03_table1.csv
GPI-2023-Web_#017_table3.csv        GPI-2023-Web_#04_table1.csv
GPI-2023-Web_#018_table3.csv        GPI-2023-Web_#05_table1.csv
GPI-2023-Web_#019_table3.csv        GPI-2023-Web_#06_table1.csv
GPI-2023-Web_#01_table1.csv     GPI-2023-Web_#07_table1.csv
GPI-2023-Web_#020_table3.csv        GPI-2023-Web_#08_table1.csv
GPI-2023-Web_#021_table3.csv        GPI-2023-Web_#09_table1.csv
GPI-2023-Web_#022_table1.csv        GPI-2023-Web_page39_w.table-000039.png
GPI-2023-Web_#023_table2.csv
[tables]$ grep -l  Safety and Security domain  *.csv
GPI-2023-Web_#011_table1.csv
GPI-2023-Web_#01_table1.csv
GPI-2023-Web_#023_table2.csv
GPI-2023-Web_#03_table1.csv
[tables]$ vi GPI-2023-Web_#01_table1.csv

虽然届时我可以选择具体的表格一和员额程序,但我想摘出一个名为“<代码>的考试、测验、测验、测验、测验、测验、测验和测验”的专门表格。 表1.1:安全和安保领域和未加说明。

这是可能的吗?

Using PDE_pdfs2table_searchandfilter 在没有任何搜查之前,有希望。 字句和过滤器。 段 次 页 次 我实际工作。 仍然摘录了许多表格。

PS: The above PDF file can be downloaded from here: GPI-2023-Web.pdf

问题回答

具体例子

search.words =  TABLE 1\.1\b 

第一个脱机序列.(在被通过到舱面之前对阵列中的单lash进行双lash评价)是:与狗特性相匹配;是用于与任何单一特性相匹配的特殊特性,即: regexcode>1.1(无越航)与1.1"<>相吻合,但101”<>。

The second escape sequence  stands for a word boundary; so without it, regex 1\.1 matches 1.1, but also 1.11 (partial match)

电话:PDE_pdfs2table_searchandfilter 可以(对与违约值相对应的基本论点价值作出评论):

PDE_pdfs2table_searchandfilter(
    pdf =  GPI-2023-Web.pdf ,
    search.words =  TABLE 1\.1\b , # short for c( TABLE 1\.1\b )
    #ignore.case.sw = FALSE, # search words are case sensitive (default)
    #regex.sw = TRUE, # use regex rules for search words
    eval.abbrevs = FALSE, # don t detect abbreviations, use search words as they are
    exp.nondetc.tabs = FALSE, # don t save images for failed to read tables
    write.tab.doc.file = FALSE # don t write info about failed to read tables
)




相关问题
How to plot fitted model over observed time series

This is a really really simple question to which I seem to be entirely unable to get a solution. I would like to do a scatter plot of an observed time series in R, and over this I want to plot the ...

REvolution for R

since the latest Ubuntu release (karmic koala), I noticed that the internal R package advertises on start-up the REvolution package. It seems to be a library collection for high-performance matrix ...

R - capturing elements of R output into text files

I am trying to run an analysis by invoking R through the command line as follows: R --no-save < SampleProgram.R > SampleProgram.opt For example, consider the simple R program below: mydata =...

R statistical package: wrapping GOFrame objects

I m trying to generate GOFrame objects to generate a gene ontology mapping in R for unsupported organisms (see http://www.bioconductor.org/packages/release/bioc/vignettes/GOstats/inst/doc/...

Changing the order of dodged bars in ggplot2 barplot

I have a dataframe df.all and I m plotting it in a bar plot with ggplot2 using the code below. I d like to make it so that the order of the dodged bars is flipped. That is, so that the bars labeled "...

Strange error when using sparse matrices and glmnet

I m getting a weird error when training a glmnet regression. invalid class "dgCMatrix" object: length(Dimnames[[2]]) must match Dim[2] It only happens occasionally, and perhaps only under larger ...

Generating non-duplicate combination pairs in R

Sorry for the non-descriptive title but I don t know whether there s a word for what I m trying to achieve. Let s assume that I have a list of names of different classes like c( 1 , 2 , 3 , 4 ) ...

Per panel smoothing in ggplot2

I m plotting a group of curves, using facet in ggplot2. I d like to have a smoother applied to plots where there are enough points to smooth, but not on plots with very few points. In particular I d ...