AI SLOP
by DARYL ANSELMO
— SEP 2024
ARTIST NOTE
This "starter article" is a work in progress. I intend to provide more visual samples and callouts shortly -- for now I just wanted to get something out as quick as possible after the release of reel 91, as a commitment to those following along and who have asked about the project.
ABSTRACT
2024 has become the year of AI based video models and workflows.
AI Slop is a collection of 91 reels cut in 91 days, and represents a deep dive into the state of the art in genAI audio and video tools, using a mix of both open source and commercially available models.
This round, Anselmo’s 11th consecutive, ran from June 30 to September 28, 2024 and used the following tools: Luma DreamMachine, Runway Gen3, KlingAI, AnimateDiff (via ComfyUI), Krea, Topaz Video AI, Magnific, Midjourney, StableDiffusion, Arcana Labs, Flux, Udio and Suno. All reels were cut in Premiere Pro.
SAMPLE SPECIMENS COMING SOON
PROCESS
Many techniques were used throughout the duration of AI Slop -- but by the end of the project, a fairly consistent process was established:
Step 1: Image Generation
SDXL and Flux were both used in some of the earlier reels, but most of the reels near the end of the project used Midjourney for image generation. Notes:
- The `--p` flag (aesthetic personalization) is a powerful feature in Midjourney that can be used to develop your own personal aesthetic. This feature is activated once you provide enough image rankings to the service. It effectively tilts your Midjourney outputs toward your own personal aesthetic preferences. After ranking hundreds of images, my personalized aesthetic was effectively curated towards a dark, gritty, dystopian look — one that greatly reflects my own true personal aesthetic taste.
- Appending `--ar 9:16` to the end of a prompt in Midjourney will produce outputs in an aspect ratio optimal for Instagram reels.
- For any given reel, about 40-50 images were generated. A “zoomed out” variant on each image was also generated, to allow for zoom-cuts in the edit.
SAMPLE IMAGES COMING SOON
Step 2: Video Generation
Next, various img2vid services are used to convert the images to video shots.
Earlier reels in the project started with AnimateDiff, which used minor modifications to many commonly available open source ComfyUI workflows. The latter part of the project pivoted to commercially available img2video services (Luma, Runway, Kling). These services take an image in combination with a text prompt to produce 5 seconds of footage. Notes:
- KlingAI is (hands down) the best video model at the time of writing. It is the most creative, provides surprising and delightful outputs at the highest quality, has the best “hit rate”, and its ability to do dozens of concurrent generations give it a distinct edge over the others.
- Luma Dream Machine is a solid alternative, although the hit rate is a little bit lower. Luma sometimes generated generic two-dimensional shots that don’t follow the prompt —- the camera simply pans across the image with no movement, lacking proper depth and object segmentation. When it -does- hit, it’s great, but the overall process is more tedious than Kling. Dream Machine also has an API which can be used for automation, an important feature that I will be exploring more in the next project.
- Aside from test shots, Runway Gen3 was not used in any of the reels. It does not currently have a portrait aspect ratio, which makes it less useful for social media.
- Typically, about 40-50 video clips were generated for a given reel.
SAMPLE IMAGES COMING SOON
Step 3: Music Generation
Suno and Udio were used to generate the music. Both of these tools are equally useful, for different reasons. Notes:
- Suno, in general, tends to do well prompting with musical instruments. Terms like “xylophone, claves, didgeridoo, sitar, synth” typically produce results true to their actual sounds. Suno is fast, within seconds you can begin listening to your output.
- Udio, on the other hand, doesn't perform as well with “instrument” prompting. Where it shines, is with “genre” prompting. Terms like "orchestral trailer, dark synthwave, acid, psychedelic, doom, horror synth” tend to produce highly varied, high quality results. Udio also did very well with emotional tones, such as "eery, melancholic, weird, epic". Udio takes a little longer to generate a track than Suno.
- For each reel, anywhere between 10-30 songs were generated, 2-5 of the best options for each reel are then shortlisted for the edit.
- All music generated for AI Slop was entirely instrumental only. Vocal tracks would have created an additional dimension of complexity to a process that was being optimized for speed.
- AI music was generated for about 85% of the reels. Some earlier reels used licensed tracks on Instagram, which ended up having to be silent when posted on Threads.
SAMPLE TRACKS COMING SOON
Artist Note: "i have had many requests for the music tracks as downloads. i am considering releasing a small digital download album -- thinking something like 10 tracks and a simple PDF with all the prompts and/or settings used to generate them." Register interest here.
Step 4: The Edit
Once a few musical "bangers" are generated, they, along with all the video clips, are imported into an Adobe Premiere Pro project. The edit is done "the old fashioned way" -- the only part of the process that does not use any automation or generation.
- Video clips are imported into the timeline sequentially, and scaled up to fit a 1080x1920 template at 60fps.
- A "listening session" is done for each of the shortlisted songs. Here, the audio and video is paired together and being evaluated for how well they vibe. It is here where it is determined whether a good cut could be made from the combined material.
- After the listening sessions, one music track "stands out" from the pack. This track is selected, and a more detailed sound edit comes first. This typically involves extracting two, four-bar loops, or, a 20-30 second segment of music that makes sense for the visuals. Cuts are added to the music beat -- typically at every bar, sometimes every half-bar.
- Every video clip is watched in sequence to the music, then ranked based on visual interest - low, medium or high. High interest clips are frontloaded into the start, or at the close of the edit, medium interest clips are placed in the middle, and low interest clips are either used for filler or discarded from the edit altogether.
- Current AI video generators often tend to generate movement in awkward slow motion, or with excessive camera movements. All clips are individually reviewed for speed, and typically sped up here to 1.5 or 2x to make movement appear more natural.
- Clips are then inserted and structured into the framework created by the audio beat. The edit is continually refined, testing each cut for interest, continuity, speed, overall emotion, and how well it all creates an implied narrative.
- The edit is then polished, often through the addition of simple Ken Burns style zooms, and/or zoom cuts on audio half-measures.
- Finally, the edit is given a final color grade and enhancement. Sharpen, Unsharp Mask and Noise are added to every clip. Lumetri in Premiere is used to grade and match each shot.
SAMPLE CALLOUTS COMING SOON
Step 5: Final Output
Once the final cut has been achieved in Premiere Pro, the reel is output at 1080x1920, 60fps in H264 format with AAC audio. 2-Pass VBR encoding is used, at a max-target of 50Mbps.
- The MP4 file output from Adobe Premiere Pro is then run through one final interpolation and enhancement step in Topaz Video AI. The Chronos Interpolation model is set to 60fps, and the Rhea Enhancement model is used, with another sharpening and detail pass, and film grain applied.
- This process takes about 10 minutes to compute locally on a 4090 equipped workstation, or, ~3 hours on a MacBook Pro when on the road. Once finished, the reel is now complete, and ready for posting to social media.
SAMPLE CALLOUTS COMING SOON
DISCOVERIES
This project was designed to get the artist up to speed on the state of the art in audio and video ai tools, develop editing skill in non-narrative media, and allowed expansion into darker, more surreal themes than he is not typically known for.
Several specific discoveries (both technical and creative) were made during this process and will eventually be added to this document.
COLLECTION
The full collection will be posted shortly.
BIO
Daryl Anselmo is a Canadian-American artist, director, advisor, and founder. He is the co-creator of the original NBA Street and Def Jam franchises for Electronic Arts, was the Art/Creative Director for FarmVille 2 at Zynga, and served for many years as a Director of Art for The Walt Disney Company.
Now an artist and proponent for the creative use of AI-based workflows, Daryl has lectured at numerous institutions including Stanford University, Siggraph, UC Berkeley, and Google. His work was showcased on the Main Stage at Ted 2023.
Currently splitting his time between San Francisco and Vancouver, Daryl is obsessed with technology and writes his own code. He is currently deepening his art practice and providing consulting and creative services for various clients.
INFO
- 91 reels, released daily between June 30 and September 28, 2024
A limited run of prints are available. Contact the artist here.