How can I batch a tensorflow dataset without loading all the data into memory simultaneously?-allqahome-开发者的问答家园

English 中文(简体)

How can I batch a tensorflow dataset without loading all the data into memory simultaneously?

原标题：

I am developing a Speech Recognition Model using TensorFlow, and I have a directory structure for my dataset. Within the ./dataset directory, I have subdirectories for validation, training, and testing (dev-clean, test-clean, train-clean-100). Each of these subdirectories contains several directories with a unique ID. Inside each of these directories, there are two files: audio.wav and text.txt.

I want to perform preprocessing on the audio and text data on the fly while loading the dataset. Here is my current dataset loading function:

def build_dataset(self, path: str = "./dataset/train-clean-100") -> Any:
    audio_files = []
    text_files = []
    for root, dirs, files in os.walk(path):
        for file in files:
            if file == "audio.wav":
                audio_files.append(os.path.join(root, file))
            elif file == "text.txt":
                text_files.append(os.path.join(root, file))
    audio_dataset = tf.data.Dataset.from_tensor_slices(audio_files)
    audio_dataset = audio_dataset.map(
        self.preprocess_audio, num_parallel_calls=tf.data.experimental.AUTOTUNE)

    text_dataset = tf.data.Dataset.from_tensor_slices(text_files)
    text_dataset = text_dataset.map(
        self.preprocess_text, num_parallel_calls=tf.data.experimental.AUTOTUNE)

    dataset = tf.data.Dataset.zip((audio_dataset, text_dataset))
    return dataset

def preprocess_audio(self, audio_path: str) -> tf.Tensor:
    audio = tf.io.read_file(audio_path)
    audio, _ = tf.audio.decode_wav(audio, desired_samples=20000)
    return audio

def preprocess_text(self, text_path: tf.Tensor) -> Any:
    texts = tf.io.read_file(text_path)
    tokens = self.vectorizer(texts)
    return tokens

This implementation works fine when I don t use batching (loading a single item). However, when I call .batch() on the dataset, my development environment (VSCode) and python crash. I suspect this is due to the entire dataset being loaded into memory instead of just the file names.

How can I efficiently batch this dataset while still loading the data on the fly, rather than loading everything into memory at once?

问题回答

暂无回答

上一篇：有人知道如何在本地主机上运行Acelle Mail吗？有人知道如何在本地主机上设置Acelle Mail吗？

下一篇：无法连接到服务器数据库系统的启动包的无效长度已被关闭, 收到快速关闭请求( pgadmin postgres docker)

相关问题

what is wrong with this mysql code

$db_user="root"; $db_host="localhost"; $db_password="root"; $db_name = "fayer"; $conn = mysqli_connect($db_host,$db_user,$db_password,$db_name) or die ("couldn t connect to server"); // perform query ...

Users asking for denormalized database

I am in the early stages of developing a database-driven system and the largest part of the system revolves around an inheritance type of relationship. There is a parent entity with about 10 columns ...

Easiest way to deal with sample data in Java web apps?

I m writing a Java web app in my free time to learn more about development. I m using the Stripes framework and eventually intend to use hibernate and MySQL For the moment, whilst creating the pages ...

Move some (not all) users from one ASP.NET Membership database to another

Does anybody know if it is possible to move some (not all) users from one ASP.NET membership database to another? (for the purposes of migrating some users to another database on another machine, but ...

Is there a reason to use two databases?

Is it because of size? By the way, what is the limit of the size?

join across databases with nhibernate

I am trying to join two tables that reside in two different databases. Every time, I try to join I get the following error: An association from the table xxx refers to an unmapped class. If the ...

How can I know if such value exists in database? (ADO.NET)

For example, I have a table, and there is a column named Tags . I want to know if value programming exists in this column. How can I do this in ADO.NET? I did this: OleDbCommand cmd = new ...

Convert date to string upon saving a doctrine record

I m trying to migrate one of my PHP projects to Doctrine. I ve never used it before so there are a few things I don t understand. In my current code, I have a class similar to this: class ...

热门标签

友情链接

Allggapp Alljchome-教程家园 mvfinale.com-影视剧情大结局大全