多行字段和插入列
原标题:Multi-line fields and inserting columns
We need to import our passwords from a format that is not explicity understood by Bitwarden.
The old tool exports to CSV, and I ve build a simple Python script to change some column names as well as inserting a couple of new columns. The script I m using is below:
import os, re, glob
import pandas as pd
df=pd.read_csv(glob.glob( accounts_*.csv )[0])
df[ collections ]=re.sub( folder- , , os.path.basename(os.getcwd()))
df[ type ]= login
result = df.rename(columns={ nickname : name , additionalInfo : notes , url : login_uri , username : login_username , password : login_password , twofaSecret : login_totp })
result.to_csv("test.csv")
It s working well except for one issue, each record has a notes field that can span multiple lines, e.g.
nickname,username,password,additionalInfo
Some site,username,PASSWORD,Some website we use
Another site,username2,PASSWORD2,A longer description with a carriage return
And some more text
And more
And more
All the records need the same two extra columns, so I m adding them this way:
df[ collections ]= Some Collection df[ type ]= login
The problem is it adds those columns on the multi-line fields, e.g.
collections,type,name,login_username,login_password,notes
Some Collection,login,username,PASSWORD,Some website we use
Some Collection,login,Another site,A longer description with a carriage return
Some Collection,login,And some more text
Some Collection,login,And more
Some Collection,login,And more
Actually Bitwarden is able to import the file successfully, just every line in the note is prefixed with Some Collection,login,.
I could use regex to remove the unwanted text, but I wondered if I could use pandas in a different way so it didn t recognise every newline as a new record?
Thanks for any tips!
问题回答
From your example, I don t think there s a way to differentiate actual new lines from the notes that span multiple lines when importing the CSV file into pandas. Unless you can export the file with the values encapsulated in quotes, as suggested in the comments.
If you want to solve it with pandas, one option would be to import the CSV as is, group by the values in nickname , username , password , then aggregate by first in all except the last column, where you .join all the rows.
Using your example as input:
import io
import pandas as pd
import numpy as np
data = """nickname,username,password,additionalInfo
Some site,username,PASSWORD,Some website we use
Another site,username2,PASSWORD2,A longer description with a carriage return
And some more text
And more
And more"""
df = pd.read_csv(io.StringIO(data))
df.loc[df["username"].isna(), "additionalInfo"] = df["nickname"]
df.loc[df["username"].isna(), "nickname"] = np.nan
g = df[["nickname", "username", "password"]].ffill()
df = df.groupby([g["nickname"], g["username"], g["password"]], as_index=False).agg(
{
"nickname": "first",
"username": "first",
"password": "first",
"additionalInfo": " ".join,
}
)
nickname username password additionalInfo
0 Another site username2 PASSWORD2 A longer description with a carriage return An...
1 Some site username PASSWORD Some website we use
In the end I discovered that the original export did have double-quotes around the multi-line comments, but not around other text fields.
I had done some other manipulation of the files with Pandas before I tried handling this, which had removed the double-quotes.
After going back to the original file with the quotes, it seemed to handle the file, despite it being slightly inconsistent with quotes..
Thanks for the tips and suggestions though!
相关问题
Styling rows in table that uses CSV data through PHP
I ve been working on this simple code that puts a CSV file into a nice table. But because the data is imported, styling ODD rows is a pretty hard thing to do.
All I need would be a method to address ...
PHP - Sanitise a comma separated string
What would be the most efficient way to clean a user input that is a comma separated string made entirely on numbers - e.g
2,40,23,11,55
I use this function on a lot of my inputs
function clean($...
How can add values in each row and column and print at the end in Perl?
Below is the sample csv file
date,type1,type2,.....
2009-07-01,n1,n2,.....
2009-07-02,n21,n22,....
and so on...
I want to add the values in each row and each column and print at the end and bottom ...
I am getting this sort of CSV data while making Http request to the CSV file. Very malformed string [closed]
I am getting this sort of CSV data while making Http request to the CSV file. Very malformed string.
response = "Subject";"Start Date";"Start Time";"End Date";"End Time";"All day event";"...
marking duplicates in a csv file
I m stumped with a problem illustrated in the sample below:
"ID","NAME","PHONE","REF","DISCARD"
1,"JOHN",12345,,
2,"PETER",6232,,
3,"JON",12345,,
4,"PETERSON",6232,,
5,"ALEX",7854,,
6,"JON",12345,,
...
Interactive heat map in Flex
I’m at a very basic level with Flex and with programming in general. I am working on a project where I have data in an Excel (.csv) format, it’s a large Excel plot/matrix where each cell has a ...
Can Python remove double quotes from a string, when reading in text file?
I have some text file like this, with several 5000 lines:
5.6 4.5 6.8 "6.5" (new line)
5.4 8.3 1.2 "9.3" (new line)
so the last term is a number between double quotes.
What I want to do is, ...
How to Transform a String from Multi-Line to Single-Line in PHP?
Is there a PHP string function that transforms a multi-line string into a single-line string?
I m getting some data back from an API that contains multiple lines. For example:
<p>Some Data</...