Custom Models: A Guide to Stable Diffusion Textual Inversions

Billy @ Generative Labs
Apr 6, 2023
3 min read

Updated: Jul 10, 2023

Creating Personalized Generative Models with Stable Diffusion Textual Inversions

TLDR: 🎨 Textual inversion is a method to customize a stable diffusion models with new images. 🤗 Hugging Face's Google Colab notebooks makes it easy to do this.

In the ever-evolving world of digital art and machine learning, artists and creators are constantly seeking innovative ways to enhance their creative processes.

One such method that has gained traction is stable diffusion textual inversions. But what exactly is it, how is it used, and why does it matter? In this blog post, we'll explore the key concepts of stable diffusion textual inversion and how it can transform generative AI process.

What are textual inversions?

Textual inversions allow users to personalize a stable diffusion model with their own images.

By training the model with just three to five samples, stable diffusion can understand the concept of the images and generate new images based on the object or style.

This powerful method enables artists to create unique and personalized content while maintaining the stability and consistency of the generated images.

By using just 3-5 images you can teach new concepts to a model such as Stable Diffusion for personalized image generation (image source).

How to create textual inversions?

To make the process of training stable diffusion models accessible, Hugging Face has created two Google Colab documents that users can utilize. By using these Colab documents, artists can train stable diffusion models with their own images and apply the same image to a brand new image using stable diffusion. This streamlined process empowers creators to experiment with their art and generate novel and visually striking content.

The Training Process

To get started with training a stable diffusion model, users need:

A free Google Colab account
A free Hugging Face 🤗 account
An access token generated from Hugging Face
A few images for training data

Google Colab is a free cloud-based platform that allows users to write, run, and share Python code in interactive Jupyter notebooks.

Hugging Face 🤗 is platform that provides tools and resources for natural language processing and machine learning.

The training images can either be of an object or a particular style.

👉 For an object, photos of the object from different angles are needed so that the stable diffusion model can understand how the object looks.

👉 For a style, images that clearly showcase the style need to be included. By providing the model with these samples, artists can train the textual inversion model to generate new images that align with their creative vision.

Steps to Train a Textual Inversion

Assuming you have the accounts for Google Collab, Hugging Face, and have generated the Hugging Face access token, here's what you need to do:

Gather your training images.
Decide whether you want to train stable diffusion to recognize an object or a particular style.
Gather three to five images of the subject (object or style) for training.
1. For an object, take photos of the object from different angles.
2. For a style, include images that clearly showcase the style.
Access the Google Colab document provided by Hugging Face [link]
Input the links to your images in the Colab document.
Choose whether you want to train the model for an "object" or a "style" and provide a placeholder token (name) for the concept.
Run the training process and wait for it to complete (may take one to four hours).
Choose whether to save the concept to the public library on Hugging Face or keep it private.
If keeping it private, download the learned embeds.bin file.
Use the second Google Colab document to generate new images based on the trained concept.
Input the name of the concept or upload the learned embeds.bin file in the second Colab document.
Provide a prompt for generating new images and run the process.

Tada 🎉 View and enjoy the generated images based on your trained concept 👏

🤔 Hungry for more technical details? If it's theory you're looking for, here's the original paper. If you'd like something more accessible and practical, here's a deep dive.

Why Stable Diffusion Textual Inversion Matters

Stable diffusion textual inversion is more than just a technical concept—it's a tool that unlocks creative possibilities.

By allowing artists to personalize stable diffusion models with their own images, this method enables the creation of unique and customized content. Whether it's experimenting with different styles, exploring new artistic directions, or generating visually compelling images, stable diffusion textual inversion is a valuable addition to any artist's toolkit.

It's a powerful and versatile method that can enhance and transform your creative process. So go ahead, experiment with stable diffusion textual inversion, and have fun!