The SynTex collection consists of audio "textures" - sounds that have a static description at some large enough temporal scale. Examples of 'real-world' textures include rain, engines, crowds, birds, roomtone, ocean waves, rolling, bubbles. Textures can also be abstract - made up of time-frequency distributions of signals, and ranging from simple windowed sine wave 'grains', chirps, and noise bursts to arbitrarily diverse sound elements.
These datasets are designed and labeled to support training conditional generative audio models - models that learn to generate complex audio under parametric control.
There are plenty of databases for training speech and music synthesizers, as well as more complex audio sets (environmental audio) for training event detectors, stream segregators, and classifiers. However, training generative audio models require densely and accurately labeled sounds - lots of parameters, possibly difficult or impossible to extract from recorded audio.
The dataset are synthetic - in fact, downloaded as code and consructed on your local machine. The default dataset created can be used as a reference, but the generation is controled by a configuration file that provides options for "wrangling" the data to train, explore, or stress-test your model as necessary. You can customize anything from the sample rates and the parameters ranges you want to train with ([0,1], [-1,1]) that map to synthesizer parameters, to the resolution and ranges of the synthesis parameters used for dataset creation.
The software is open source, and necessarily distributed with each dataset. A jupyter lab notebook comes with each dataset in case you are interested in extending or developing new synthesis models and datasets.
SynTex is a growing dataset collection and software used to generate texture datasets on-demand. The datasets are completely configurable with the config file. Each synthesizer comes with the defaults listed with the config file. The dataset gets created autoamtically on running the default config file.
Wrangling: Users have control over dataset generation process via the following key attributes in the configuration file.
Audio quality | ComputeSR, and datafileSR |
Texture variation | randomSeed |
Synthesisizer parameters | Choose synthesizer parameters to be fixed or sampled |
Dataset ranges and resolution | synth_minval, synth_maxval, user_nvals |
Audio labels (saved to metadata file, mapped to synth values) | user_minval, user_maxval, user_nvals |
Audio Duration | soundDuration (per param setting), numChunks |
Meta-data formats | recordFormat (params, tfrecords, and njson) |
Datasets can be generated with paaramManager paramManager or tensorflow TFRecords metadata files for your data loading convenience. Pandas dataframes are expected in the near future.
This section includes tables specifying the different metadata file formats that SynTex can generate. The parameters specified in configuration file are mapped to meta-data representations in the dataset
The paramManageer format creates a json representation for each audio file in dataset along with the parameters and their descriptions. See the paramManager gitHub repository for more information about using this format for data loading.
{ "meta": { "filename": "PopPatternSynth--Irregularity-00.00--v-01.params", "irreg_exp_user_doc": "map to natural synth irregularity param [0, 1]", "irreg_exp_synth_doc": "(n/event-per-second) as standard deviation of gaussian around regularly spaced events normalized by events-per-second.", "rate_exp_user_doc": "Fixed value. mapping to natural synth rate_exp values [2,3]", "rate_exp_synth_doc": "2**n events per second.", "cf_user_doc": "Fixed value. mapping to natural synth rate_exp values [2,3]", "cf_synth_doc": "Center frequency in hz." }, "irreg_exp": { "times": [ 0, 4 ], "values": [ 0.0, 0.0 ], "units": "natural", "nvals": 21, "minval": 0, "maxval": 1, "origMinval": 0, "origUnits": null, "origMaxval": 1 } }
{ 'audio': #tf.Tensor: shape=(88200,), dtype=float32, numpy=array([ 0., 0. , 0. , ..., -0.29178527, -0.30153185, -0.3096234], dtype=float32), 'pfname': #tf.Tensor: shape=(1,), dtype=string, numpy=array([b'newDataset/PopPatternSynth--Irregularity-00.00--v-00.params'], dtype=object), 'segmentNum': #tf.Tensor: shape=(1,), dtype=int64, numpy=array([0]), 'soundDuration': #tf.Tensor: shape=(2,), dtype=float32, numpy=array([0., 2.], dtype=float32)} 'cf': #tf.Tensor: shape=(1,), dtype=float32, numpy=array([540.], dtype=float32), 'cf_synth_maxval': #tf.Tensor: shape=(1,), dtype=float32, numpy=array([inf], dtype=float32), 'cf_synth_minval': #tf.Tensor: shape=(1,), dtype=float32, numpy=array([inf], dtype=float32), 'cf_synth_units': #tf.Tensor: shape=(1,), dtype=string, numpy=array([b'natural'], dtype=object), 'cf_user_maxval': #tf.Tensor: shape=(1,), dtype=float32, numpy=array([inf], dtype=float32), 'cf_user_minval': #tf.Tensor: shape=(1,), dtype=float32, numpy=array([inf], dtype=float32), 'cf_user_nvals': #tf.Tensor: shape=(1,), dtype=int64, numpy=array([1]), 'irreg_exp': #tf.Tensor: shape=(1,), dtype=float32, numpy=array([0.], dtype=float32), 'irreg_exp_synth_maxval': #tf.Tensor: shape=(1,), dtype=float32, numpy=array([0.666], dtype=float32), 'irreg_exp_synth_minval': #tf.Tensor: shape=(1,), dtype=float32, numpy=array([0.333], dtype=float32), 'irreg_exp_synth_units': #tf.Tensor: shape=(1,), dtype=string, numpy=array([b'natural'], dtype=object), 'irreg_exp_user_maxval': #tf.Tensor: shape=(1,), dtype=float32, numpy=array([1.], dtype=float32), 'irreg_exp_user_minval': #tf.Tensor: shape=(1,), dtype=float32, numpy=array([0.], dtype=float32), 'irreg_exp_user_nvals': #tf.Tensor: shape=(1,), dtype=int64, numpy=array([4]), 'rate_exp': #tf.Tensor: shape=(1,), dtype=float32, numpy=array([2.], dtype=float32), 'rate_exp_synth_maxval': #tf.Tensor: shape=(1,), dtype=float32, numpy=array([inf], dtype=float32), 'rate_exp_synth_minval': #tf.Tensor: shape=(1,), dtype=float32, numpy=array([inf], dtype=float32), 'rate_exp_synth_units': #tf.Tensor: shape=(1,), dtype=string, numpy=array([b'natural'], dtype=object), 'rate_exp_user_maxval': #tf.Tensor: shape=(1,), dtype=float32, numpy=array([inf], dtype=float32), 'rate_exp_user_minval': #tf.Tensor: shape=(1,), dtype=float32, numpy=array([inf], dtype=float32), 'rate_exp_user_nvals': #tf.Tensor: shape=(1,), dtype=int64, numpy=array([1]), }
The nsjson is compressed format that writes only the variable parameters into the json format. It is used specifically for training this version of the GANSynth
"PopPatternSynth--Irregularity-00.00--v-00": { "irreg_exp_natural": 1.0, "irreg_exp_norm": 0.0, "samplerate": 16000, "sound_name": "PopPatternSynth", "sound_source": "PopPatternSynth", "sound_source_int": 1, "sound_source_str": "Generated" }
Lonce Wyse and Prashanth T.R are the key contributors to this collection and supporting software. Questions, comments, bug reports, to lonce.
This research has been supported by a Singapore MOE Tier 2 grant, “Learning Generative Recurrent Neural Networks”, and by an NVIDIA Corporation Academic Programs GPU grant.
Please follow the developments to SynTex at Github page
The SynTex DSSynth module is a heirarchical synthesis unit designed for creating audio textures with event patterns at various time scales. Here is an illustration of its use in creating the Peeper Night Chorus texture.