Project

Sketching with Data Types

Jessica Stringham

Groups

Media Lab Research Theme: Cultivating Creativity

As a machine learning engineer, I love a good parameterization that represents something interesting, like a creative coding sketch, as a core set of interesting numbers.

You can sample from distributions and automatically generate values through hyperparameter search methods! You can interpolate! You can explore the latent space!

So it might not be surprising that murrelet, the framework I built to do creative coding visuals, centers around a data structure filled with numbers and booleans. It being built in Rust means that I can take it one step further by using algebraic data types, which provides enums to choose between things and structs to combine things together.

Experiments

Transitions

One challenge with live performances is updates to parameters can cause jarring updates. By interpolating, we animate the transition. I published a Rust package called Lerpable that lets you derive a trait on your data type that interpolates between two different values of a deeply-nested data type. It works by traversing the data type of your “before” and “after” states, and any number that has changed goes from start to end as t changes from 0 to 1.

Specifically, when comparing two of the same types it compares each field individually, and when comparing bools or different enums, it’ll do a “step” where it changes over at 0.5. It has an additional functionality around vectors, where it’ll interpolate the number of items and let you fade-in or fade out new items. The Lerpable package also gives you ways to bypass or give custom methods to lerp.

Future work could be interpolating the livecode expressions. You could also help automatically explore a space by combining two configurations in other ways, like combining things parents in genetic algorithms.

Visualizing a space to aid exploration

When I take a screen capture of the output of a system, it also records a snapshot of the full Data Type. In an experiment last fall, I loaded the previous few months of screencap data types, squashed the features into a vector, and ran it through UMAP. UMAP is a way to reduce dimensions that pull items together that are close to each other and nudge things away that are far away. The result was a visual representation of the space I had explored and had found interesting results in. When I saved a new image, it would run the learned mapping and mark where I was on the map.

The final two images reveals an issue with the configuration-only approach: the images are structurally similar, but the fourth image ends up on a separate island. In this case, it's due to a large numeric change in a variable that controls the density of some noise added to the image, which less noticeable at this resolution (this was verified by adjusting the noise value in the third image, which sent its embedding location to the small island next to the fourth image's.) Some examples of next steps could be normalizing numerical values or incorporating visual information into the embedding space so UMAP has clues about which values matter.

Automatic exploration

Another area I’m looking at is applying methods from machine learning to automatically explore some of the parameter space in a way that lets you be creative.

I’m going to pause here to ramble a bit. I hypothesize that the most creative interaction is not “creating a button that generates images you like” but rather the result of improving the structure of the data type. I also think that the structure won’t necessarily be “the most expressive possibility” (even though I tend to build as expressive as I can), but restricting it in some ways. In my practice, I think of it as “giving the system a personality.”

Anyways. Baby steps. A first baby step would be to try out automatically exploring a parameter space. I can sample from a probability distribution to generate configurations throughout the space and use feedback to update the probability distribution.

An initial step is to figure out a way to randomly generate configurations.

I started by simplifying from an entire system to just a single shape drawn with a specific method that imbues it with some opinions: it usually connects curves with tangents unless a sharp angle is specified, and it applies some symmetries. As my initial test, I plan to start by marking invalid ones (e.g., where the shape extends beyond the boundaries or is too small to see) and see if I can improve the distribution to produce fewer of those.

Generating enums

While this custom system is a fine way to start, it would be nice to be able to randomly generate data from the schema itself.

One relevant method of generating random data from data types comes from property-based testing. You can provide a schema, and it will generate fake data for you to run tests on to check for unexpected edge cases. I looked at https://hypothesis.readthedocs.io/en/latest/ , which was cool but couldn’t support my wildly recursive structures yet, and http://json-schema-faker.js.org , which worked a bit better but had some bugs.

After some reflection, I might be able to actually reuse the live-coding portion of my framework for this. It already supports randomly setting floating numbers. If I could extend it to be able to dynamically select from enums as well, I would have the pieces needed.

Github for Murrelet live coding framework Github for Lerpable