By Tod Machover and Charles Holbrow
In his brilliant, provocative 1966 essay, The Prospects of Recording, Glenn Gould proposed elevating – pardon the pun – elevator music from pernicious drone to enriching ear training. In his view, the ubiquitous presence of background sound could subversively train listeners to be sensitive to the building blocks, structural forms and hidden meanings of music, turning the art form into the universal language of the emotions that it was destined to be. In a not-unrelated development, Gould had somewhat recently traded the concert hall for the recording studio, an act echoed by The Beatles' release in 1967 of Sgt. Peppers' Lonely Hearts Club Band, an album conceived and produced in a multi-track recording studio and never meant to be played in concert. And while Gould's dream of a transformative elevator music never quite panned out, it is clear that from the 1940s through the '60s — from Les Paul and Mary Ford's pioneering use of overdubs in How High the Moon, to the birth of rock and roll with Chuck Berry's "Maybellene" in 1955, and on to Schaeffer, Stockhausen, Gould, The Beatles and many more — a totally new art form, enabled by magnetic tape recording and processing, was born.
Today we are at a similar crossroads. Music streaming – and, in general, music distribution and networking via the Internet – has become the "elevator music" of our time, offering endless songs and sounds, all supposedly adapted to our tastes and primed for making social connections. But many of the current trends are not promising, and can even be seen as leading to the downgrading of music's potential. Algorithmic curation is still primitive and often proposes paler – not bolder – versions of music we supposedly like. Current machine-learning techniques for music generation produce generic, composer-less pieces that sort-of sound like something, but never sound great. And it could be argued that the vast potential of the Internet as an artistic medium has not yet resulted in a new kind of music, as potently different in form and content from what surrounds us as magnetic tape music was from live performance. In fact, it seems as if the Internet and streaming have changed everything about music except music itself.
The key to harnessing the power of streaming to create something really new might be to turn the medium's ubiquity and fluidity into an advantage. Can we meaningfully allow for a given piece of music to morph and evolve with different impact on each hearing? Can this mutability engage artists' imaginations in new ways? Can listeners – or even the entire environment – play important collaborative roles in building such a "living music" culture? Several current projects at the MIT Media Lab, where we work, explore various forms that dynamically streamed music might take.
The current paradigm – unchanged in the streaming era – is to treat a static recording as the terminal and canonical version of a composition. But a mastered, unchanged, "finished" recording is actually a limited representation of a composition. It is, also, not always what artists actually want. John Cage and many others invented numerous open forms to allow for multiple compositional (not merely expressive) interpretations, and Pierre Boulez famously revised most of his pieces from year to year, often without leaving a "definitive" version. After The Beatles stopped recording together in the early 1970s, John Lennon told George Martin that he was unsatisfied with their catalog and wished to re-record everything the band ever released (especially "Strawberry Fields," apparently). And of course, prior to Edison's first phonograph in 1877, every single music performance was unique by necessity and could never be repeated without variation.
When recorded music was primarily distributed on physical media, finalizing a recording was an essential step. Now that music is primarily distributed over the Internet, this constraint has been lifted. Music can now, again, be less about the master recording and more about the dialogue between artist and medium, artist and public, or music and the world itself. Labels and artists have begun to scratch the surface of what is possible. Consider the now-common pattern: An artist releases a song, and if that song starts to get traction on social media, it is quickly followed by an acoustic version, a music video and then countless club remixes. This is a first-step example of how a recording can change after it is first released, but it is currently the only option available within the narrow confines of popular streaming platforms.
In the future, artists will push the concept of evolving music much further. Instead of releasing a static recording, artists could release music that is dynamic, fluid and open for reinterpretation, remixing and reimagining. This would undoubtedly develop in numerous, well, streams — some of which we are currently working on.
A first example experiments with an open-form approach to music production. Conventional pop songs today layer tens, hundreds, or even thousands of different sounds together. Before that song is released, the relative loudness level of all parts is finalized in a studio in the "mixdown" process, during which the structure of the song, the instrumentation, and all the additional audio effects are locked into place, resulting in a final arrangement. In the conventional workflow, a mix engineer is responsible for every tone, level and effect configuration for all the separate parts. The techniques we are currently developing enable the engineer to share control over the mixdown and arrangement with intelligent algorithmic processes. The most obvious use for this kind of music production software would be to train AI agents to perform some of the simpler parts of the mixing process; for example, a software agent could be taught to set the balance between the main vocal part of a song and the background. It might also help a musician or engineer prepare a song for release more efficiently. It does not enable a kind of music that is fundamentally different from the original model provided.
The more exciting potential comes from working toward an idea where music is not the output of such a system, but is in fact the system itself. From this perspective, we could imagine and create a whole range of musical experiences that would not fit inside today's streaming music paradigms and techniques.
To go beyond this "smart mix" model, Charles is working on an "Evolving Media" environment, through which a music composition changes as time passes. In particular, he's is creating a feedback loop that causes a recording to permanently update itself based on how it is consumed and shared on the Internet. To make this possible, he is re-designing multiple existing technologies, from the software that we use to record, synthesize and mix music; to the cloud servers that stream content to listeners; as well as the playback apps on listeners' devices — interconnecting them all in a single, iterative platform, allowing for:
- Notation and annotation by the artist to be bundled like enhanced, hyperlinked liner notes.
- Compositions could be updated or revised, either by the artists or algorithmically.
- It becomes much more practical for other artists to remix, cover, and collaborate.
- The system leaves behind a history of the song's evolution, a record of that song's compositional process.
- This "procedural" content could produce "infinite compositions" that evolve forever.
- It could be that, as with Snapchat, only the current state of the evolving composition would be available to listeners or collaborators, then gone forever, making forward evolution an essential – and only partially controllable – part of the composition itself.
Another example of an evolving, collaborative composition process is represented by the City Symphony series, developed by Tod and his colleagues in the MIT Media Lab's Opera of the Future group. Started as a collaboration with the Toronto Symphony Orchestra in 2013, these projects develop a sonic portrait of a city using both "musical" and "found" sounds, and invite the creative participation of anyone who lives in that place and wants to contribute. Using the shared experience of locale as a unifying element, the symphonies have established unusual dialogue between very diverse members of each community, from Perth to Lucerne to Edinburgh, and from Philadelphia to Miami to D-etroit, all pulled together through Tod's compositional vision. Special mobile apps were developed for each city that allow the public to record sounds that they would like to contribute to the project. All sounds are tagged geographically and form a growing sonic map of the city. Constellation software automatically analyzes, organizes and color-codes the collected sounds, arraying them to be mixed by anyone online with mouse or finger. These "city mixes" are in turn uploaded to be shared and further morphed, creating an ever-changing city soundscape that can be incorporated into the final symphony. Numerous other apps and online tools have been specially designed for each city — such as Media Scores and live online collaboration sessions — to facilitate creative public participation. The next series of City Symphonies, currently in development, will extend the city model to countries, such as a first-ever collaboration between citizens of South and North Korea, and a "world trade" symphony for Dubai that will continue to evolve - publicly and via streaming - far into the future.
Although tools are currently being developed here at the Lab to intelligently automate making sonically meaningful connections between collected clips in a massive database, and between "noisy" and "musical" sounds, something normally done manually and impossible to accomplish at scale, the Media Lab's "Cognitive Audio"project takes an even more radical approach. Musician/scientists Ishwarya Ananthabhotla and David Ramsay are working on a system that allows the generation of constantly evolving compositions, based on the intriguing sounds surrounding us that we may not even notice. Using cutting-edge research in psychoacoustics, auditory scene analysis and auditory memory-recall, their software can take hours of recorded ambient sounds from the found environment and then automatically select and edit the sounds which we are likely to find most interesting andmight most want to remember. Then, by measuring our mood through preference tests and biometric readings fed through machine learning algorithms, the system produces constantly streaming audio experiences that turn the everyday into an emotional, personalized, musically relevant, memory-enhancing journey.