English 中文(简体)
如何使用特定代码将CSV文件中的数据导入Python
原标题:How to import data from CSV file into Python with certain codes
  • 时间:2023-06-27 05:11:32
  •  标签:
  • python
  • csv

我有一个文件data.csv,其中的数据内容如下:

    142.52  114.28  126.26  152.39  144.66  85.77   125.67  102.87  103.86  114.50
1   108.98  136.10  121.75  108.52  115.19  112.91  136.93  101.75  112.48  88.32
2   53.15   119.53  123.89  90.64   152.98  75.89   106.95  102.13  137.73  136.42
3   109.21  113.83  87.62   65.00   98.80   109.24  131.47  130.89  97.98   99.74
4   84.29   76.61   70.68   146.38  95.45   96.93   80.07   122.09  72.59   101.60
5   87.46   114.01  95.43   81.56   105.42  114.87  107.38  112.26  92.87   98.49
6   92.08   98.56   89.17   70.27   109.67  97.68   72.25   115.58  87.22   107.08
7   121.09  108.67  129.59  80.44   114.33  91.82   87.97   94.02   99.55   107.16
8   81.88   124.65  115.64  74.20   136.81  145.24  130.40  102.28  83.84   127.83
9   97.65   131.30  110.31  126.22  113.38  120.63  106.22  142.97  108.63  114.32

我更喜欢只使用Python的内置库,例如字符串方法split和strip,以及列表方法append来获得预期的结果。

预期输出:

文件中包含的浮点值的单一标注列表。


[ 142.52 ,
  114.28 ,
  126.26 ,
  152.39 ,
  144.66 ,
  85.77 ,
  125.67 ,
  102.87 ,
  103.86 ,
  114.50 ,
  108.98 ,
  136.10 ,
  121.75 ,
  108.52 ,
  115.19 ,
  112.91 ,
  136.93 ,
  101.75 ,
  112.48 ,
  88.32 ,
  … ]

该过程应考虑不同的CSV分隔符。

问题回答

您应该使用pandas进行导入并使用数据帧。

import pandas as pd
data = pd.read_csv("data.csv")

Pandas数据帧可以做很多有趣的事情,并与其他库(如用于矢量化过程的numpy)以及机器学习库(如sklearn、xgboost、catboost等)配合良好。。。

found a method to deal with it like a list
you can use numpy ndarray to read it
then del the first column
and flatten all element to a list
the file I used is here, you should save it as a txt file or a csv file

0   142.52  114.28  126.26  152.39  144.66  85.77   125.67  102.87  103.86  114.50
1   108.98  136.10  121.75  108.52  115.19  112.91  136.93  101.75  112.48  88.32
2   53.15   119.53  123.89  90.64   152.98  75.89   106.95  102.13  137.73  136.42
3   109.21  113.83  87.62   65.00   98.80   109.24  131.47  130.89  97.98   99.74
4   84.29   76.61   70.68   146.38  95.45   96.93   80.07   122.09  72.59   101.60
5   87.46   114.01  95.43   81.56   105.42  114.87  107.38  112.26  92.87   98.49
6   92.08   98.56   89.17   70.27   109.67  97.68   72.25   115.58  87.22   107.08
7   121.09  108.67  129.59  80.44   114.33  91.82   87.97   94.02   99.55   107.16
8   81.88   124.65  115.64  74.20   136.81  145.24  130.40  102.28  83.84   127.83
9   97.65   131.30  110.31  126.22  113.38  120.63  106.22  142.97  108.63  114.32

这是我写的代码

import numpy as np

inpath = r C:Users10696Desktopaccess6app.txt 
kk = np.loadtxt(inpath)
kk = kk[:, 1:]
kk = kk.flatten()
kk

the result is
enter image description here

一种非常“手动”和“原始”的方式来满足您的需求。

import csv

result = []
with open("test.csv") as f:
    # The delimiter parameter is the different CSV separators
    data = csv.reader(f, delimiter=" ")  
    for i in data:
    # This is because I copied your csv contents into a csv file, 
    # but it is not a standard csv format, there are multiple 
    # spaces between each column, if your csv is standard, 
    # then there is no need to make redundant judgments, just
        result.extend(_ for _ in i[1:] if _)
        # result.extend(i[1:]) # like this
print(result)

该问题将输出描述为浮点值列表,但实际上显示的是字符串列表。让我们假设需要浮动。

源文件之所以有趣,是因为第一行与第二行及后续行具有不同的结构。它们似乎有某种“索引”值,这是输出中不需要的

这意味着输入文件中的第一行必须以与其他行稍微不同的方式处理。

因此(无进口):

FILENAME =  /Volumes/G-Drive/data.csv 

def get_tokens(data, separator):
    yield next(data).split(separator)
    for line in data:
        yield line.split(separator)[1:]

def genlist(filename, separator=None):
    _list = []
    with open(filename) as data:
        for tokens in get_tokens(data, separator):
            _list.extend(map(float, tokens))
    return _list

print(genlist(FILENAME))

I m doing the same assignment(i ll update if i can figure it out) So far:

data = []
d = open( data.csv ,  r )
dataraw = d.read().strip().split()

结果显示为列表中的字符串,但我还没有弄清楚如何将每个字符串分隔为单独的值,以便将它们放在一个大列表中:

[142.52114.28126.26152.39144.66 85.77125.67102.87103.86114.50,108.98136.10121.75108.52115.19112.91136.93101.75112.48,88.32,53.15119.5123.89,0.64152.98,75.89106.95102.13137.73136.42,109.21113.83,87.62,60.00,98.80109.241314.7130.89,7.98,99.74,84.29,76.61,70.68146.38,95.45,96.93,0.07122.09,72.59101.60,87.46114.01,95.43,81.56105.42114.871007.38112.26,9.2.87,98.49,92.08,98.56,89.17,70.27109.67,97.68,72.25115.58,87.22107.08121.09108.67129.5980.44114.33,98.8287.97,94.02,99.55107.16,81.88124.65115.64,714.2136.81145.24130.40102.2883.84127.8397.65131.30110.31126.22113.38120.63106.22142.97108.63114.32]

thank you for all your helpful input. I have figured it out using a lot of trial and error as usual. Here is the solution i eventually ended up with:

data = [] datafile = open( data.csv , r )

for line in datafile: entries = line.strip().split( , ) for entry in entries: values = entry.strip().split( ) for value in values: data.append(float(value))

我的最终解决方案

如果文件中没有行号,那么使用python的标准库,简单的解决方案可以是这样的:

data = []
with open( data.csv ) as input_file:
    for row in input_file:
        for value in row.strip().split():
            data.append(float(value))

print(data)
  • 1: Open the file
  • 2: Loop rows from the file
  • 3: Strip() removes non printable characters like newline from the start and end of the file
  • 4: Split() splits the stripped row into a "list"
  • 5: Loop the splitted list and append values converted to floats to the data list




相关问题
Can Django models use MySQL functions?

Is there a way to force Django models to pass a field to a MySQL function every time the model data is read or loaded? To clarify what I mean in SQL, I want the Django model to produce something like ...

An enterprise scheduler for python (like quartz)

I am looking for an enterprise tasks scheduler for python, like quartz is for Java. Requirements: Persistent: if the process restarts or the machine restarts, then all the jobs must stay there and ...

How to remove unique, then duplicate dictionaries in a list?

Given the following list that contains some duplicate and some unique dictionaries, what is the best method to remove unique dictionaries first, then reduce the duplicate dictionaries to single ...

What is suggested seed value to use with random.seed()?

Simple enough question: I m using python random module to generate random integers. I want to know what is the suggested value to use with the random.seed() function? Currently I am letting this ...

How can I make the PyDev editor selectively ignore errors?

I m using PyDev under Eclipse to write some Jython code. I ve got numerous instances where I need to do something like this: import com.work.project.component.client.Interface.ISubInterface as ...

How do I profile `paster serve` s startup time?

Python s paster serve app.ini is taking longer than I would like to be ready for the first request. I know how to profile requests with middleware, but how do I profile the initialization time? I ...

Pragmatically adding give-aways/freebies to an online store

Our business currently has an online store and recently we ve been offering free specials to our customers. Right now, we simply display the special and give the buyer a notice stating we will add the ...

Converting Dictionary to List? [duplicate]

I m trying to convert a Python dictionary into a Python list, in order to perform some calculations. #My dictionary dict = {} dict[ Capital ]="London" dict[ Food ]="Fish&Chips" dict[ 2012 ]="...

热门标签