A user-controllable framework that unifies style transfer methods

rockstaryreviews

June 17, 2024

1 View 0

SaveSavedRemoved 0

Neural style transfer is the use of neural networks to transfer the style of one input image — say, a famous painting — to another input image — say, a backyard photograph.

Researchers have proposed a number of different techniques for doing style transfer, but which one works best? There’s no right answer to that question; viewers’ opinions differ. In the results reported in prior papers on style transfer, the most-preferred methods rarely receive more than two-thirds of reviewers’ votes, while the least-preferred methods rarely receive less than 5%.

Related content

By plotting nonlinear trajectories through a GAN’s latent space, the method enables certain image attributes to vary while others are held fixed.

In a paper we presented at this year’s meeting of the Association for the Advancement of Artificial Intelligence (AAAI), my colleagues and I describe a new style transfer model that can output multiple options, controlled by a model parameter that the user selects.

We show that most prior approaches to style transfer can be rewritten in a standardized form that we call the assign-and-mix model. The model’s “assign” step involves an assignment matrix, which maps features of one input image to features of the other. In the paper, we show that the differences between style transfer techniques generally come down to the entropy of the assignment matrix, or the diversity of the matrix’s values.

Top: a content image (a swing) and a style image (van Gogh’s Starry Night); bottom: four candidate images generated by the Amazon researchers’ new style transfer model.

Finally, we show that, given a user-specified setting of the input parameter, an algorithm called Sinkhorn-Knopp can efficiently calculate the associated assignment matrix, enabling a diversity of outputs from the same style transfer model.

In a series of experiments, we compared our approach to its predecessors. We found that, according to standard metrics, our method did a better job of preserving the content of the content input and the style of the style input, and it produced more diverse outputs. We also conducted a study with 10 human evaluators and found that — at a particular setting of our diversity parameter — subjects preferred images generated by our method to those produced by other methods.

Assign and mix

In style transfer, the first step is to pass both the content example and the style example to the same visual encoder, which is typically pretrained on a broad object recognition task. The encoder produces a representation of each image, in which each image region has an associated feature vector.

Related content

Two methods presented at CVPR achieve state-of-the-art results by imposing additional structure on the representational space.

The feature vectors will typically encode visual information — about, say, colors and orientations of gradients — but also semantic information — indicating, say, that a particular image region depicts part of an eye.

Style transfer typically involves (1) reshuffling elements of the style image to reproduce the content of the content image, (2) warping the content image so that its aggregate statistics resemble those of the style image, or (3) some combination of the two. We assimilate all such approaches to the assign-and-mix model.

The “assign” step of assign-and-mix corresponds to approach (1). It involves the assignment matrix, which assigns feature vectors from the style representation to regions of a new image, guided by the content representation. Although prior style transfer approaches use a variety of techniques to find correspondences between style and content features, we analyze several of them in the paper and show that they can often be assimilated to the assignment-matrix model.

Related content

Technique that mixes public and private training data can meet differential-privacy criteria while cutting error increase by 60%-70%.

The assignment for a particular point in the new image may be a single vector from the style encoding, or it may be a weighted combination of vectors. In the first case, the assignment matrix is binary: every matrix entry is either 0 or 1. This is a minimal-entropy assignment.

By contrast, if every point in the new content image consists of a weighted combination of every vector in the style image, the assignment matrix has higher entropy. There are existing style transfer approaches with binary assignment matrices, and there are existing approaches with high-entropy matrices, and our method can approximate both.

After the assignment step, we proceed to the mixing phase, which corresponds to approach (2), above. In this phase, we step through the encoding of the new, synthetic image, and for each image region, we measure the distance between its encoding and that of the original content example. Then we mix in the feature vectors from the original content encoding, in proportion to the degree of divergence. This ensures that the new image preserves the content of the original.

The proposed approach. Epsilon is the parameter used to control the range of entropy values for the assignment matrix; f_s→c is the reconstruction of the content image, using features of the style image, produced by the assignment matrix.

The computational bottleneck in this process is the creation of multiple assignment matrices, with different degrees of entropy. But we show in our paper that the Sinkhorn-Knopp algorithm, which enables matrices to be rewritten in a standardized form that enables efficient solution, can be applied to the problem of constructing assignment matrices.

In the paper, we rewrite three prior style transfer methods using the assign-and-mix format. We selected those methods because their assignment matrices cover the full spectrum of entropies. Our method should be able to approximate the outputs of any style transfer models whose assignment matrix entropies fall within a more limited range as well.

SaveSavedRemoved 0

A user-controllable framework that unifies style transfer methods

Previous

Ten university teams selected for Alexa Prize TaskBot Challenge 2

A user-controllable framework that unifies style transfer methods

Next

Amazon and Howard University announce academic collaboration

Tags: AAAI Synthetic data generation

Related Articles

Added to wishlistRemoved from wishlist 0

Detoxification of large language models via regularized fine-tuning

Detoxification of large language models via regularized fine-tuning

Added to wishlistRemoved from wishlist 0

Solomonic learning: Large language models and the art of induction

Solomonic learning: Large language models and the art of induction

Added to wishlistRemoved from wishlist 0

A quick guide to Amazon’s 50-plus papers at EMNLP 2024

A quick guide to Amazon’s 50-plus papers at EMNLP 2024

Added to wishlistRemoved from wishlist 0

Just Announced: Introducing the newest Fire TV Devices coming to the Fire TV family.

Just Announced: Introducing the newest Fire TV Devices coming to the Fire TV family.

We will be happy to hear your thoughts

Leave a reply Cancel reply

Shopping cart