i am a beginner and want to make a barcode out of this DNA sequence by using pyhton code. it s supposed to read each 1024 nucleotide and checks for mers (a combination of 4 nucleotides i.g. AAAA, AAAC, AAAG, AAAT ..... TTTT). each mer holds an index in an array (size = 256) if it found AAAA within the first 1024 it stores its count in it s index and so on, then to the next 1024 until it s done with the whole sequence. that will create a 2D array which will be turned into a png in gray scale.
my problem is that it took only the first 1024 and displayed it on the entire 1024X256 image.
DNA: https://1drv.ms/f/s!AuXxv7yqjA_FlS_ujYOMUvikWg8E
#read the DNA sequence
fasta_file = open(r C:pathEscherichia_coli_ATCC_10798.fasta , r )
SE =fasta_file.read()
fasta_file.close()
seq = SE[177:]
dna_sequence = seq.replace("
","")
# Sample size and mer length
#sample is the window that will go thorugh the whole sequance
sample_size = 1024
mer_length = 4
# Array to store the counts of each mer
barcode = [0] * 256
# Generate all possible 4-mers
mers = []
for i in range(256):
mer = ""
for j in range(4):
mer += "ACGT"[i % 4]
i //= 4
mers.append(mer)
# Loop through the sample and count the occurrences of each mer
for w in range(sample_size):
mer = dna_sequence[w:w+mer_length]
barcode[mers.index(mer)] += 2
# Print the counts of each mer
#print(mers[i], ":", barcode[i])
print(barcode)
# image
# Python program to convert numpy array to image
# import pillow library
from PIL import Image
import numpy as np
# define a main function
# Create the barcode array with the same shape as the desired image
code = np.array(barcode, dtype=np.uint8)
# Create an Image object from the barcode array
image = Image.fromarray(code)
# Reshape the image to the desired size (1024x4000)
image = image.resize((1024, 4000))
# Save the image
image.save( Escherichia_coli.png )
# Display the image (optional)
image.show()
the dark image is what i got the other one is what i was supposed to getenter image description here my output
i don t know how to attach the DNA sequence. some info: genome_id="531534bd23a542ae" atcc_catalog_number="ATCC 10798" species="Escherichia coli"
link to similar genome: