SynTex: Synthetic audio textures dataset collection


The SynTex collection consists of audio "textures" - sounds that have a static description at some large enough temporal scale. Examples of 'real-world' textures include rain, engines, crowds, birds, roomtone, ocean waves, rolling, bubbles. Textures can also be abstract - made up of time-frequency distributions of signals, and ranging from simple windowed sine wave 'grains', chirps, and noise bursts to arbitrarily diverse sound elements.



Contents

Motivation


These datasets are designed and labeled to support training conditional generative audio models - models that learn to generate complex audio under parametric control.

There are plenty of databases for training speech and music synthesizers, as well as more complex audio sets (environmental audio) for training event detectors, stream segregators, and classifiers. However, training generative audio models require densely and accurately labeled sounds - lots of parameters, possibly difficult or impossible to extract from recorded audio.

The dataset are synthetic - in fact, downloaded as code and consructed on your local machine. The default dataset created can be used as a reference, but the generation is controled by a configuration file that provides options for "wrangling" the data to train, explore, or stress-test your model as necessary. You can customize anything from the sample rates and the parameters ranges you want to train with ([0,1], [-1,1]) that map to synthesizer parameters, to the resolution and ranges of the synthesis parameters used for dataset creation.

The software is open source, and necessarily distributed with each dataset. A jupyter lab notebook comes with each dataset in case you are interested in extending or developing new synthesis models and datasets.


Description


SynTex is a growing dataset collection and software used to generate texture datasets on-demand. The datasets are completely configurable with the config file. Each synthesizer comes with the defaults listed with the config file. The dataset gets created autoamtically on running the default config file.

Configuration file:

The default configuration will allways generate the exact same dataset.

Wrangling: Users have control over dataset generation process via the following key attributes in the configuration file.


Audio quality ComputeSR, and datafileSR
Texture variation randomSeed
Synthesisizer parameters Choose synthesizer parameters to be fixed or sampled
Dataset ranges and resolution synth_minval, synth_maxval, user_nvals
Audio labels (saved to metadata file, mapped to synth values) user_minval, user_maxval, user_nvals
Audio Duration soundDuration (per param setting), numChunks
Meta-data formats recordFormat (params, tfrecords, and njson)


Training models with SynTex datasets


Datasets can be generated with paaramManager (temporarily anonymized) or tensorflow TFRecords metadata files for your data loading convenience. Pandas dataframes are expected in the near future.


Metadata file formats


This section includes tables specifying the different metadata file formats that SynTex can generate. The parameters specified in configuration file are mapped to meta-data representations in the dataset


paramManager

The paramManageer format creates a json representation for each audio file in dataset along with the parameters and their descriptions. See the (termporaily anonymized) for more information about using this format for data loading.


{
    "meta": {
        "filename": "PopPatternSynth--Irregularity-00.00--v-01.params",
        "irreg_exp_user_doc": "map to natural synth irregularity param [0, 1]",
        "irreg_exp_synth_doc": "(n/event-per-second) as standard deviation of gaussian around regularly spaced events normalized by events-per-second.",
        "rate_exp_user_doc": "Fixed value. mapping to natural synth rate_exp values [2,3]",
        "rate_exp_synth_doc": "2**n events per second.",
        "cf_user_doc": "Fixed value. mapping to natural synth rate_exp values [2,3]",
        "cf_synth_doc": "Center frequency in hz."
    },
    "irreg_exp": {
        "times": [
            0,
            4
        ],
        "values": [
            0.0,
            0.0
        ],
        "units": "natural",
        "nvals": 21,
        "minval": 0,
        "maxval": 1,
        "origMinval": 0,
        "origUnits": null,
        "origMaxval": 1
    }
}
		    	

Tfrecords
{
	'audio': #tf.Tensor: shape=(88200,), dtype=float32, numpy=array([ 0.,  0. ,  0. , ..., -0.29178527, -0.30153185, -0.3096234], dtype=float32), 

	'pfname': #tf.Tensor: shape=(1,), dtype=string, numpy=array([b'newDataset/PopPatternSynth--Irregularity-00.00--v-00.params'], dtype=object), 
	'segmentNum': #tf.Tensor: shape=(1,), dtype=int64, numpy=array([0]),
	'soundDuration': #tf.Tensor: shape=(2,), dtype=float32, numpy=array([0., 2.], dtype=float32)} 

	'cf': #tf.Tensor: shape=(1,), dtype=float32, numpy=array([540.], dtype=float32), 
	'cf_synth_maxval': #tf.Tensor: shape=(1,), dtype=float32, numpy=array([inf], dtype=float32), 
	'cf_synth_minval': #tf.Tensor: shape=(1,), dtype=float32, numpy=array([inf], dtype=float32), 
	'cf_synth_units': #tf.Tensor: shape=(1,), dtype=string, numpy=array([b'natural'], dtype=object), 
	'cf_user_maxval': #tf.Tensor: shape=(1,), dtype=float32, numpy=array([inf], dtype=float32), 
	'cf_user_minval': #tf.Tensor: shape=(1,), dtype=float32, numpy=array([inf], dtype=float32), 
	'cf_user_nvals': #tf.Tensor: shape=(1,), dtype=int64, numpy=array([1]), 

	'irreg_exp': #tf.Tensor: shape=(1,), dtype=float32, numpy=array([0.], dtype=float32), 
	'irreg_exp_synth_maxval': #tf.Tensor: shape=(1,), dtype=float32, numpy=array([0.666], dtype=float32),
	'irreg_exp_synth_minval': #tf.Tensor: shape=(1,), dtype=float32, numpy=array([0.333], dtype=float32), 
	'irreg_exp_synth_units': #tf.Tensor: shape=(1,), dtype=string, numpy=array([b'natural'], dtype=object),
	'irreg_exp_user_maxval': #tf.Tensor: shape=(1,), dtype=float32, numpy=array([1.], dtype=float32), 
	'irreg_exp_user_minval': #tf.Tensor: shape=(1,), dtype=float32, numpy=array([0.], dtype=float32), 
	'irreg_exp_user_nvals': #tf.Tensor: shape=(1,), dtype=int64, numpy=array([4]), 


	'rate_exp': #tf.Tensor: shape=(1,), dtype=float32, numpy=array([2.], dtype=float32), 
	'rate_exp_synth_maxval': #tf.Tensor: shape=(1,), dtype=float32, numpy=array([inf], dtype=float32), 
	'rate_exp_synth_minval': #tf.Tensor: shape=(1,), dtype=float32, numpy=array([inf], dtype=float32), 
	'rate_exp_synth_units': #tf.Tensor: shape=(1,), dtype=string, numpy=array([b'natural'], dtype=object), 
	'rate_exp_user_maxval': #tf.Tensor: shape=(1,), dtype=float32, numpy=array([inf], dtype=float32), 
	'rate_exp_user_minval': #tf.Tensor: shape=(1,), dtype=float32, numpy=array([inf], dtype=float32), 
	'rate_exp_user_nvals': #tf.Tensor: shape=(1,), dtype=int64, numpy=array([1]), 
}

NSJSON

The nsjson is compressed format that writes only the variable parameters into the json format. It is used specifically for training this version of the GANSynth


 
    "PopPatternSynth--Irregularity-00.00--v-00": {
        "irreg_exp_natural": 1.0,
        "irreg_exp_norm": 0.0,
        "samplerate": 16000,
        "sound_name": "PopPatternSynth",
        "sound_source": "PopPatternSynth",
        "sound_source_int": 1,
        "sound_source_str": "Generated"
    }
	    	

Creators of the dataset:


(temporarily anonymized)

How to cite


If you use the SynTex dataset in your work, please cite the paper:
		    	To be published
		    

You can also use the following BibTeX entry:
		    	To be published
		    

Acknowledgements and funding


(temporarily anonymized)

Development


Please follow the developments to SynTex at [temporarily anonymized]

The SynTex DSSynth module is a heirarchical synthesis unit designed for creating audio textures with event patterns at various time scales. Here is an illustration of its use in creating the Peeper Night Chorus texture.