Question

我有一个文件data.csv，其中的数据内容如下：

    142.52  114.28  126.26  152.39  144.66  85.77   125.67  102.87  103.86  114.50
1   108.98  136.10  121.75  108.52  115.19  112.91  136.93  101.75  112.48  88.32
2   53.15   119.53  123.89  90.64   152.98  75.89   106.95  102.13  137.73  136.42
3   109.21  113.83  87.62   65.00   98.80   109.24  131.47  130.89  97.98   99.74
4   84.29   76.61   70.68   146.38  95.45   96.93   80.07   122.09  72.59   101.60
5   87.46   114.01  95.43   81.56   105.42  114.87  107.38  112.26  92.87   98.49
6   92.08   98.56   89.17   70.27   109.67  97.68   72.25   115.58  87.22   107.08
7   121.09  108.67  129.59  80.44   114.33  91.82   87.97   94.02   99.55   107.16
8   81.88   124.65  115.64  74.20   136.81  145.24  130.40  102.28  83.84   127.83
9   97.65   131.30  110.31  126.22  113.38  120.63  106.22  142.97  108.63  114.32

我更喜欢只使用Python的内置库，例如字符串方法split和strip，以及列表方法append来获得预期的结果。

预期输出：

文件中包含的浮点值的单一标注列表。

该过程应考虑不同的CSV分隔符。

Answer 1

您应该使用pandas进行导入并使用数据帧。

import pandas as pd
data = pd.read_csv("data.csv")

Pandas数据帧可以做很多有趣的事情，并与其他库（如用于矢量化过程的numpy）以及机器学习库（如sklearn、xgboost、catboost等）配合良好。。。

Answer 2

found a method to deal with it like a list
you can use numpy ndarray to read it
then del the first column
and flatten all element to a list
the file I used is here, you should save it as a txt file or a csv file

0   142.52  114.28  126.26  152.39  144.66  85.77   125.67  102.87  103.86  114.50
1   108.98  136.10  121.75  108.52  115.19  112.91  136.93  101.75  112.48  88.32
2   53.15   119.53  123.89  90.64   152.98  75.89   106.95  102.13  137.73  136.42
3   109.21  113.83  87.62   65.00   98.80   109.24  131.47  130.89  97.98   99.74
4   84.29   76.61   70.68   146.38  95.45   96.93   80.07   122.09  72.59   101.60
5   87.46   114.01  95.43   81.56   105.42  114.87  107.38  112.26  92.87   98.49
6   92.08   98.56   89.17   70.27   109.67  97.68   72.25   115.58  87.22   107.08
7   121.09  108.67  129.59  80.44   114.33  91.82   87.97   94.02   99.55   107.16
8   81.88   124.65  115.64  74.20   136.81  145.24  130.40  102.28  83.84   127.83
9   97.65   131.30  110.31  126.22  113.38  120.63  106.22  142.97  108.63  114.32

这是我写的代码

import numpy as np

inpath = r C:Users10696Desktopaccess6app.txt 
kk = np.loadtxt(inpath)
kk = kk[:, 1:]
kk = kk.flatten()
kk

the result is

Answer 3

一种非常“手动”和“原始”的方式来满足您的需求。

import csv

result = []
with open("test.csv") as f:
    # The delimiter parameter is the different CSV separators
    data = csv.reader(f, delimiter=" ")  
    for i in data:
    # This is because I copied your csv contents into a csv file, 
    # but it is not a standard csv format, there are multiple 
    # spaces between each column, if your csv is standard, 
    # then there is no need to make redundant judgments, just
        result.extend(_ for _ in i[1:] if _)
        # result.extend(i[1:]) # like this
print(result)

Answer 4

该问题将输出描述为浮点值列表，但实际上显示的是字符串列表。让我们假设需要浮动。

源文件之所以有趣，是因为第一行与第二行及后续行具有不同的结构。它们似乎有某种“索引”值，这是输出中不需要的。

这意味着输入文件中的第一行必须以与其他行稍微不同的方式处理。

因此（无进口）：

FILENAME =  /Volumes/G-Drive/data.csv 

def get_tokens(data, separator):
    yield next(data).split(separator)
    for line in data:
        yield line.split(separator)[1:]

def genlist(filename, separator=None):
    _list = []
    with open(filename) as data:
        for tokens in get_tokens(data, separator):
            _list.extend(map(float, tokens))
    return _list

print(genlist(FILENAME))

Answer 5

I m doing the same assignment(i ll update if i can figure it out) So far:

data = []
d = open( data.csv ,  r )
dataraw = d.read().strip().split()

结果显示为列表中的字符串，但我还没有弄清楚如何将每个字符串分隔为单独的值，以便将它们放在一个大列表中：

[142.52114.28126.26152.39144.66 85.77125.67102.87103.86114.50，108.98136.10121.75108.52115.19112.91136.93101.75112.48，88.32，53.15119.5123.89，0.64152.98，75.89106.95102.13137.73136.42，109.21113.83，87.62，60.00，98.80109.241314.7130.89，7.98，99.74，84.29，76.61，70.68146.38，95.45，96.93，0.07122.09，72.59101.60，87.46114.01，95.43，81.56105.42114.871007.38112.26,9.2.87,98.49，92.08,98.56,89.17,70.27109.67,97.68,72.25115.58,87.22107.08121.09108.67129.5980.44114.33,98.8287.97,94.02,99.55107.16，81.88124.65115.64,714.2136.81145.24130.40102.2883.84127.8397.65131.30110.31126.22113.38120.63106.22142.97108.63114.32]

Answer 6

thank you for all your helpful input. I have figured it out using a lot of trial and error as usual. Here is the solution i eventually ended up with:

data = [] datafile = open( data.csv , r )

for line in datafile: entries = line.strip().split( , ) for entry in entries: values = entry.strip().split( ) for value in values: data.append(float(value))

我的最终解决方案

Answer 7

如果文件中没有行号，那么使用python的标准库，简单的解决方案可以是这样的：

data = []
with open( data.csv ) as input_file:
    for row in input_file:
        for value in row.strip().split():
            data.append(float(value))

print(data)

1: Open the file
2: Loop rows from the file
3: Strip() removes non printable characters like newline from the start and end of the file
4: Split() splits the stripped row into a "list"
5: Loop the splitted list and append values converted to floats to the data list

友情链接