Yuchen Tian's zi2zi-JiT can generate an entire Chinese font set from a few dozen reference characters in under a day. That's not a trivial claim — Chinese fonts require thousands of individual glyphs, each with intricate stroke patterns that have to hold together visually across sizes and weights. Japanese and Korean add their own complications.
The system is a conditional diffusion transformer: feed it a source character and a sample glyph in the target style, and it produces the character in that style. Tian built it on top of JiT (Just image Transformer), a pixel-space architecture introduced in a November 2025 MIT paper by Tianhong Li and Kaiming He, adding three components: a Content Encoder, a Style Encoder, and a Multi-Source In-Context Mixing module that concatenates font, style, and content embeddings into a single conditioning sequence.
Two pretrained variants — JiT-B/16 and JiT-L/16 — were trained for 2,000 epochs on more than 400 fonts covering simplified Chinese, traditional Chinese, and Japanese, totaling over 300,000 character images. On a benchmark of 2,400 ground-truth pairs, the larger variant scores 0.6794 SSIM and 0.1967 LPIPS, putting it roughly in line with FontDiffuser, which appeared at AAAI 2024.
The practical pitch is the LoRA fine-tuning pipeline. Adapting the model to a new font takes less than an hour on a single H100 GPU and draws about 4 GB of VRAM at batch size 16 — low enough to be usable well outside a well-resourced research lab.
Tian demonstrated what that means in practice with a companion project called Zi-QuanHengDuLiang (权衡度量体). Starting from 338 glyphs hand-extracted from a Qing Dynasty manuscript, he used zi2zi-JiT to reconstruct all 6,763 characters in the GB2312 set. The characters were generated at 256×256 pixels, vectorized with Potrace, and assembled into distributable TTF and OTF files using FontForge. The whole process, including model training, took roughly five days.
That example points to something more interesting than generative typography for its own sake. Historical scripts and rare writing systems often survive only in manuscripts where usable reference material is thin — exactly the conditions zi2zi-JiT is built for. The code is on GitHub under an MIT license, with a font artifact addendum that requires attribution for commercial distributions of more than 200 generated characters.