SynTex

SynTex: Synthetic audio textures dataset collection

The SynTex collection consists of audio "textures" - sounds that have a static description at some large enough temporal scale. Examples of 'real-world' textures include rain, engines, crowds, birds, roomtone, ocean waves, rolling, bubbles. Textures can also be abstract - made up of time-frequency distributions of signals, and ranging from simple windowed sine wave 'grains', chirps, and noise bursts to arbitrarily diverse sound elements.

Motivation
Description
Training models with SynTex
- Format
How to cite
Acknowledgements and funding
Development
License

Motivation

These datasets are designed and labeled to support training conditional generative audio models - models that learn to generate complex audio under parametric control.

There are plenty of databases for training speech and music synthesizers, as well as more complex audio sets (environmental audio) for training event detectors, stream segregators, and classifiers. However, training generative audio models require densely and accurately labeled sounds - lots of parameters, possibly difficult or impossible to extract from recorded audio.

The dataset are synthetic - in fact, downloaded as code and consructed on your local machine. The default dataset created can be used as a reference, but the generation is controled by a configuration file that provides options for "wrangling" the data to train, explore, or stress-test your model as necessary. You can customize anything from the sample rates and the parameters ranges you want to train with ([0,1], [-1,1]) that map to synthesizer parameters, to the resolution and ranges of the synthesis parameters used for dataset creation.

The software is open source, and necessarily distributed with each dataset. A jupyter lab notebook comes with each dataset in case you are interested in extending or developing new synthesis models and datasets.

Description

SynTex is a growing dataset collection and software used to generate texture datasets on-demand. The datasets are completely configurable with the config file. Each synthesizer comes with the defaults listed with the config file. The dataset gets created autoamtically on running the default config file.

Configuration file:

The default configuration will allways generate the exact same dataset.

Wrangling: Users have control over dataset generation process via the following key attributes in the configuration file.

Audio quality	ComputeSR, and datafileSR
Texture variation	randomSeed
Synthesisizer parameters	Choose synthesizer parameters to be fixed or sampled
Dataset ranges and resolution	synth_minval, synth_maxval, user_nvals
Audio labels (saved to metadata file, mapped to synth values)	user_minval, user_maxval, user_nvals
Audio Duration	soundDuration (per param setting), numChunks
Meta-data formats	recordFormat (params, tfrecords, and njson)

Training models with SynTex datasets

Datasets can be generated with paaramManager paramManager or tensorflow TFRecords metadata files for your data loading convenience. Pandas dataframes are expected in the near future.

Metadata file formats

This section includes tables specifying the different metadata file formats that SynTex can generate. The parameters specified in configuration file are mapped to meta-data representations in the dataset

paramManager

The paramManageer format creates a json representation for each audio file in dataset along with the parameters and their descriptions. See the paramManager gitHub repository for more information about using this format for data loading.

{
    "meta": {
        "filename": "PopPatternSynth--Irregularity-00.00--v-01.params",
        "irreg_exp_user_doc": "map to natural synth irregularity param [0, 1]",
        "irreg_exp_synth_doc": "(n/event-per-second) as standard deviation of gaussian around regularly spaced events normalized by events-per-second.",
        "rate_exp_user_doc": "Fixed value. mapping to natural synth rate_exp values [2,3]",
        "rate_exp_synth_doc": "2**n events per second.",
        "cf_user_doc": "Fixed value. mapping to natural synth rate_exp values [2,3]",
        "cf_synth_doc": "Center frequency in hz."
    },
    "irreg_exp": {
        "times": [
            0,
            4
        ],
        "values": [
            0.0,
            0.0
        ],
        "units": "natural",
        "nvals": 21,
        "minval": 0,
        "maxval": 1,
        "origMinval": 0,
        "origUnits": null,
        "origMaxval": 1
    }
}

Tfrecords

{
	'audio': #tf.Tensor: shape=(88200,), dtype=float32, numpy=array([ 0.,  0. ,  0. , ..., -0.29178527, -0.30153185, -0.3096234], dtype=float32), 

	'pfname': #tf.Tensor: shape=(1,), dtype=string, numpy=array([b'newDataset/PopPatternSynth--Irregularity-00.00--v-00.params'], dtype=object), 
	'segmentNum': #tf.Tensor: shape=(1,), dtype=int64, numpy=array([0]),
	'soundDuration': #tf.Tensor: shape=(2,), dtype=float32, numpy=array([0., 2.], dtype=float32)} 

	'cf': #tf.Tensor: shape=(1,), dtype=float32, numpy=array([540.], dtype=float32), 
	'cf_synth_maxval': #tf.Tensor: shape=(1,), dtype=float32, numpy=array([inf], dtype=float32), 
	'cf_synth_minval': #tf.Tensor: shape=(1,), dtype=float32, numpy=array([inf], dtype=float32), 
	'cf_synth_units': #tf.Tensor: shape=(1,), dtype=string, numpy=array([b'natural'], dtype=object), 
	'cf_user_maxval': #tf.Tensor: shape=(1,), dtype=float32, numpy=array([inf], dtype=float32), 
	'cf_user_minval': #tf.Tensor: shape=(1,), dtype=float32, numpy=array([inf], dtype=float32), 
	'cf_user_nvals': #tf.Tensor: shape=(1,), dtype=int64, numpy=array([1]), 

	'irreg_exp': #tf.Tensor: shape=(1,), dtype=float32, numpy=array([0.], dtype=float32), 
	'irreg_exp_synth_maxval': #tf.Tensor: shape=(1,), dtype=float32, numpy=array([0.666], dtype=float32),
	'irreg_exp_synth_minval': #tf.Tensor: shape=(1,), dtype=float32, numpy=array([0.333], dtype=float32), 
	'irreg_exp_synth_units': #tf.Tensor: shape=(1,), dtype=string, numpy=array([b'natural'], dtype=object),
	'irreg_exp_user_maxval': #tf.Tensor: shape=(1,), dtype=float32, numpy=array([1.], dtype=float32), 
	'irreg_exp_user_minval': #tf.Tensor: shape=(1,), dtype=float32, numpy=array([0.], dtype=float32), 
	'irreg_exp_user_nvals': #tf.Tensor: shape=(1,), dtype=int64, numpy=array([4]), 


	'rate_exp': #tf.Tensor: shape=(1,), dtype=float32, numpy=array([2.], dtype=float32), 
	'rate_exp_synth_maxval': #tf.Tensor: shape=(1,), dtype=float32, numpy=array([inf], dtype=float32), 
	'rate_exp_synth_minval': #tf.Tensor: shape=(1,), dtype=float32, numpy=array([inf], dtype=float32), 
	'rate_exp_synth_units': #tf.Tensor: shape=(1,), dtype=string, numpy=array([b'natural'], dtype=object), 
	'rate_exp_user_maxval': #tf.Tensor: shape=(1,), dtype=float32, numpy=array([inf], dtype=float32), 
	'rate_exp_user_minval': #tf.Tensor: shape=(1,), dtype=float32, numpy=array([inf], dtype=float32), 
	'rate_exp_user_nvals': #tf.Tensor: shape=(1,), dtype=int64, numpy=array([1]), 
}

NSJSON

The nsjson is compressed format that writes only the variable parameters into the json format. It is used specifically for training this version of the GANSynth

 
    "PopPatternSynth--Irregularity-00.00--v-00": {
        "irreg_exp_natural": 1.0,
        "irreg_exp_norm": 0.0,
        "samplerate": 16000,
        "sound_name": "PopPatternSynth",
        "sound_source": "PopPatternSynth",
        "sound_source_int": 1,
        "sound_source_str": "Generated"
    }

SynTex: Synthetic audio textures dataset collection

Contents

Motivation

Description

Training models with SynTex

How to cite

Acknowledgements and funding

Development

License

Motivation

Description

Configuration file:

Training models with SynTex datasets

Metadata file formats

paramManager

Tfrecords

NSJSON

Creators of the dataset:

How to cite

Acknowledgements and funding

Development