English 中文(简体)
多行字段和插入列
原标题:Multi-line fields and inserting columns
  • 时间:2024-07-22 10:57:57
  •  标签:
  • pandas
  • csv
We need to import our passwords from a format that is not explicity understood by Bitwarden. The old tool exports to CSV, and I ve build a simple Python script to change some column names as well as inserting a couple of new columns. The script I m using is below: import os, re, glob import pandas as pd df=pd.read_csv(glob.glob( accounts_*.csv )[0]) df[ collections ]=re.sub( folder- , , os.path.basename(os.getcwd())) df[ type ]= login result = df.rename(columns={ nickname : name , additionalInfo : notes , url : login_uri , username : login_username , password : login_password , twofaSecret : login_totp }) result.to_csv("test.csv") It s working well except for one issue, each record has a notes field that can span multiple lines, e.g. nickname,username,password,additionalInfo Some site,username,PASSWORD,Some website we use Another site,username2,PASSWORD2,A longer description with a carriage return And some more text And more And more All the records need the same two extra columns, so I m adding them this way: df[ collections ]= Some Collection df[ type ]= login The problem is it adds those columns on the multi-line fields, e.g. collections,type,name,login_username,login_password,notes Some Collection,login,username,PASSWORD,Some website we use Some Collection,login,Another site,A longer description with a carriage return Some Collection,login,And some more text Some Collection,login,And more Some Collection,login,And more Actually Bitwarden is able to import the file successfully, just every line in the note is prefixed with Some Collection,login,. I could use regex to remove the unwanted text, but I wondered if I could use pandas in a different way so it didn t recognise every newline as a new record? Thanks for any tips!
问题回答
From your example, I don t think there s a way to differentiate actual new lines from the notes that span multiple lines when importing the CSV file into pandas. Unless you can export the file with the values encapsulated in quotes, as suggested in the comments. If you want to solve it with pandas, one option would be to import the CSV as is, group by the values in nickname , username , password , then aggregate by first in all except the last column, where you .join all the rows. Using your example as input: import io import pandas as pd import numpy as np data = """nickname,username,password,additionalInfo Some site,username,PASSWORD,Some website we use Another site,username2,PASSWORD2,A longer description with a carriage return And some more text And more And more""" df = pd.read_csv(io.StringIO(data)) df.loc[df["username"].isna(), "additionalInfo"] = df["nickname"] df.loc[df["username"].isna(), "nickname"] = np.nan g = df[["nickname", "username", "password"]].ffill() df = df.groupby([g["nickname"], g["username"], g["password"]], as_index=False).agg( { "nickname": "first", "username": "first", "password": "first", "additionalInfo": " ".join, } ) nickname username password additionalInfo 0 Another site username2 PASSWORD2 A longer description with a carriage return An... 1 Some site username PASSWORD Some website we use
In the end I discovered that the original export did have double-quotes around the multi-line comments, but not around other text fields. I had done some other manipulation of the files with Pandas before I tried handling this, which had removed the double-quotes. After going back to the original file with the quotes, it seemed to handle the file, despite it being slightly inconsistent with quotes.. Thanks for the tips and suggestions though!




相关问题
Styling rows in table that uses CSV data through PHP

I ve been working on this simple code that puts a CSV file into a nice table. But because the data is imported, styling ODD rows is a pretty hard thing to do. All I need would be a method to address ...

PHP - Sanitise a comma separated string

What would be the most efficient way to clean a user input that is a comma separated string made entirely on numbers - e.g 2,40,23,11,55 I use this function on a lot of my inputs function clean($...

marking duplicates in a csv file

I m stumped with a problem illustrated in the sample below: "ID","NAME","PHONE","REF","DISCARD" 1,"JOHN",12345,, 2,"PETER",6232,, 3,"JON",12345,, 4,"PETERSON",6232,, 5,"ALEX",7854,, 6,"JON",12345,, ...

Interactive heat map in Flex

I’m at a very basic level with Flex and with programming in general. I am working on a project where I have data in an Excel (.csv) format, it’s a large Excel plot/matrix where each cell has a ...

热门标签