AMD has released a new Stable Diffusion 3 Medium artificial intelligence (AI) model optimised for XDNA 2 neural processing units (NPUs). The chipmaker claimed that it is the world’s first AI model that processes outputs in the BF16 format. The model will be supported by the newer Ryzen AI laptops with at least 24GB RAM, after users download Tensorstack’s Amuse 3.1 beta software. The Stable Diffusion 3 Medium is an on-device image generation model that does not require Internet connectivity.
AMD’s Image Generation Model Can Generate Print-Ready Images
In a press release, the Santa Clara-based tech giant detailed the new image generation model. The AI model is based on Stable Diffusion 3 Medium, which is optimised for the company’s XDNA NPUs and are equipped in the Ryzen AI laptops released in 2024 and newer.
The company claims the model can be used to generate stock-quality images from text prompts. The model generates 1024×1024 resolution images, which are then upscaled to 2048×2048 print-ready resolution using the NPU’s capabilities.
The new AI model is part of AMD and Tensorstack’s new Amuse 3.1 desktop app, which is free to download and install. Since the image generation model runs entirely locally, it even works when the device is not connected to the Internet. The data-processing occurs on-device, powered by the XDNA 2 NPUs.
AMD said it has worked on the memory requirements of the AI model, and it now requires 24GB RAM, instead of 32GB RAM which was necessary for the Stable Diffusion XL Turbo model. Additionally, the new image model consumes only 9GB of RAM while active. The company achieved this by using the block floating point 16 or block fp16 (BF16) memory-efficient format.
The tech giant highlighted that the Stable Diffusion 3 Medium AI model strictly adheres to the prompt, structure, and order. AMD said users trying out the model should first describe the type of image, then the structural components, and finally details and other context. Negative prompts can be used to remove elements from the image, and placement of full stops can change the context understanding of the model.