PhotoMaker V2 AI Photo One-Click PC & Mac Running Package!

PhotoMaker V2 AI Photo One-Click PC & MAC Running Package!

A few days ago, Tencent’s open-source project PhotoMaker received a significant update—PhotoMaker V2! This version is not only more powerful but can also generate personalized Olympic-style photos in just seconds. It’s simply amazing!

What is PhotoMaker V2?

If you haven’t heard of PhotoMaker yet, you’re missing out on a fantastic tool! PhotoMaker V2 can complete personalized character image customization in seconds by stacking ID embeddings, without the need for additional LoRA training. All you need to do is upload a few of your photos, and with prompt words or other reference images, you can generate photos of yourself in various scenes, states, and styles!

How does it work?

The working principle of PhotoMaker V2 is super cool! It uses multiple input images to create a unified ID embedding representation. These embedding vectors contain various characteristics of the person (such as facial features, hairstyle, expressions, etc.). Then, it uses this ID representation to generate images, maintaining the consistency of the person’s features. It’s fast, high-quality, and can generate customized photos based on text descriptions!

Performance Optimization

The new version improves the ID authenticity of the generated images while maintaining quality. With the V100 graphics card, the time per image has improved from 1 minute to 14 seconds! That’s lightning fast!

Package Download link: https://www.patreon.com/posts/photomaker-v2-ai-109026532

PC One-Click Startup Package

Good things are meant to be shared! PhotoMaker V2 has been made into a one-click startup package. Just click to use it, without worrying about various configuration issues.

Computer Requirements

  • Windows 10/11 64-bit operating system
  • NVIDIA graphics card with 8G VRAM or above
  • CUDA >= 12.1

Download and Usage Tutorial

  1. Extract the Files:
    After extracting, avoid using Chinese paths, and double-click the “run.exe” file to run.

  2. Browser Access:
    The software will automatically open the browser, and the interface will look like this.

Mac Running Package

Note: Only supports devices with Mac M1/2/3 series chips.

Installation Steps

  1. Download the DMG image file from the link above and drag the app file into the Applications folder.
  2. After copying and installing, do not open from the launchpad for the first time, open it from the Applications folder by right-clicking.
  3. The software will automatically open the operation interface in the default browser, and you can start using it in the browser.

Technical Highlights

  • Efficient Personalized Generation: PhotoMaker encodes any number of input ID images into stacked ID embeddings, retaining ID information. This embedding not only encapsulates the characteristics of the same input ID but also accommodates different IDs, providing possibilities for subsequent integration.
  • ID-Oriented Data Construction Pipeline: The research team proposed an ID-oriented data construction pipeline, driving PhotoMaker’s training, showing better ID retention capabilities during testing compared to fine-tuning methods, while offering significant speed improvements and high-quality generation results.
  • Wide Application Range: It can generate realistic photos and bring people from the last century or even ancient times into this century through artistic paintings, sculptures, or old photos. It also allows for stylization while retaining ID attributes, and can change gender and age by simply replacing class words.
  • Identity Mixing: If users provide images of different IDs as input, PhotoMaker can integrate the characteristics of different IDs well to form a new ID. This can be achieved by controlling the percentage of identity images in the input image pool or by weighting prompts to adjust the merging ratio.

Method Analysis

PhotoMaker’s method mainly includes the following steps:

  1. Text and Image Encoding: First, obtain text embeddings and image embeddings from the text encoder and image encoder.
  2. Fusion Embedding: Extract fusion embeddings by merging corresponding class embeddings (e.g., male and female) and each image embedding.
  3. Stacked ID Embedding: Connect all fusion embeddings along the length dimension to form stacked ID embeddings.
  4. Adaptive Merging: Feed the stacked ID embeddings into all cross-attention layers to adaptively merge ID content in the diffusion model.

This method uses images of the same ID and masked backgrounds during training, but can directly input images of different IDs during inference without background distortion, creating new IDs.

Application Examples

  • Bringing Characters from Artworks/Old Photos to Reality: By inputting artistic paintings, sculptures, or old photos of characters, PhotoMaker can bring people from the last century or even ancient times into this century and “take” photos for them.
  • Stylization: It not only has the ability to generate realistic human photos but also allows for stylization while retaining ID attributes.
  • Changing Age or Gender: By simply replacing class words, PhotoMaker can change gender and age while maintaining the original identity.
  • Identity Mixing: New IDs can be customized by controlling the proportion of different IDs in the input image pool, or by multiplying the embeddings corresponding to images related to specific IDs by a coefficient to control their integration ratio with the new ID.

Comparisons and Advantages

Compared to other methods, PhotoMaker has significant advantages in high-quality and diverse generation capabilities. It not only has editability and high inference efficiency but also boasts strong ID fidelity. More comparison results can be found in the research team’s paper.

Conclusion

The release of PhotoMaker brings new breakthroughs to the field of personalized text-to-image generation. It not only improves generation efficiency and quality but also expands the application range, making the generated images more diverse and realistic. This technology has broad application prospects, whether in artistic creation, historical reenactment, or personalized avatar generation.