English 中文(简体)
国防军使用Adhur的皮图(其他类似图表)数据
原标题:Extract Pie Chart (other similar charts) data from PDF using Python

Help me in extracting the pie chart data from attached PDF using Python. PDF: https://i.dell.com/sites/csdocuments/CorpComm_Docs/en/carbon-footprint-poweredge-m630.pdf

Thanks in Advance Santhosh

我曾尝试使用PyPDF2、PyMuPDF图书馆。 既然是新来的,就没有确定实现这一目标的所有现有方法和其他图书馆。

问题回答

这是使用PyPDF2图书馆的一种做法。

  1. Convert PDF to Image using PyMuPDF lib or pdf2image to convert the PDF pages to images.
  2. You could install pdf2image using: pip install pdf2image
  3. Use an image processing library like OpenCV or PIL to analyze the pie chart image and extract data, and finally once you ve successfully extracted the image data, you can use image analysis techniques to determine the size or percentage of each slice in the pie chart.

Here s a small example using OpenCV

#Load the image
image_path =  path/to/your/image.png 
image = cv2.imread(image_path)
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)

#Thresholding
_, thresh = cv2.threshold(gray, 128, 255, cv2.THRESH_BINARY_INV)

#Find contours
contours, _ = cv2.findContours(thresh, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)

#Print the number of contours (slices in the pie chart)
print("Number of slices:", len(contours))

#Draw the contours on the image
cv2.drawContours(image, contours, -1, (0, 255, 0), 2)

#Display the image
cv2.imshow( Image with Contours , image)

你们在决定方法之前,必须分析任何PDF物体,例如,国防军提供的图像实际上是内部图像资源的缩影,因此,需要提取图像来保持质量。

“entergraph

因此,在这种情况下,Poppler pdfimages是一种好的方法。 你们要么当眼看着人民抵抗力量(并作出决定),要么从沙里德使用普勒,从军阀手中夺走。

“enterography

如果你一度提取所有图像,就不必按页打印,就如此,第2页的图像也将输出到同一页。

“enterography

Note that OCR will not usually be able to recognise data at an angle such as along the bottom edge only those at 0 or 90 degrees as seen on the Left enter image description here

However for horizontal text OCR is not usually a problem enter image description here





相关问题
Can Django models use MySQL functions?

Is there a way to force Django models to pass a field to a MySQL function every time the model data is read or loaded? To clarify what I mean in SQL, I want the Django model to produce something like ...

An enterprise scheduler for python (like quartz)

I am looking for an enterprise tasks scheduler for python, like quartz is for Java. Requirements: Persistent: if the process restarts or the machine restarts, then all the jobs must stay there and ...

How to remove unique, then duplicate dictionaries in a list?

Given the following list that contains some duplicate and some unique dictionaries, what is the best method to remove unique dictionaries first, then reduce the duplicate dictionaries to single ...

What is suggested seed value to use with random.seed()?

Simple enough question: I m using python random module to generate random integers. I want to know what is the suggested value to use with the random.seed() function? Currently I am letting this ...

How can I make the PyDev editor selectively ignore errors?

I m using PyDev under Eclipse to write some Jython code. I ve got numerous instances where I need to do something like this: import com.work.project.component.client.Interface.ISubInterface as ...

How do I profile `paster serve` s startup time?

Python s paster serve app.ini is taking longer than I would like to be ready for the first request. I know how to profile requests with middleware, but how do I profile the initialization time? I ...

Pragmatically adding give-aways/freebies to an online store

Our business currently has an online store and recently we ve been offering free specials to our customers. Right now, we simply display the special and give the buyer a notice stating we will add the ...

Converting Dictionary to List? [duplicate]

I m trying to convert a Python dictionary into a Python list, in order to perform some calculations. #My dictionary dict = {} dict[ Capital ]="London" dict[ Food ]="Fish&Chips" dict[ 2012 ]="...

热门标签