Do Transformers Hallucinate of Softer Skin?

Anshumani Ruddra

1 year ago

Cover image generated on Dall-E with the prompt: “A robot similar to Optimus Prime standing on a barren landscape looking at his human hand, cartoon style”

All new technology in its early phase is highly disruptive to the status quo. I call this the “dynamite” phase – you can use it to blow things to smithereens or tunnel through mountains and move humanity forward. AI, especially generative AI, is in its “dynamite” phase. Artists and writers – people who make a living productizing their creativity are alarmed. What do all of these new developments mean for their livelihoods?

As someone who spent half a dozen years writing for a living and has then spent the next dozen+ years building consumer tech products (games, healthcare, education, media and payments) – this is a topic close to my heart. Creating something is a personal act. It requires craft, years of deliberate practice and patience. Writers, artists, coders and designers – all share these traits. We learn from other practitioners – through reading, viewing, observing, listening and most importantly doing – and over time develop what we believe is our own unique style.

Every artist is afraid that their work might be too derivative and not original. The fear of being called a copycat looms large. And yet – we all learnt through imitation. No one questions a child (or even an adult messing around with an Apple Pencil) who looks at van Gogh’s Sunflowers and tries to copy it.

From the series “Sunflowers” by Vincent van Gogh

The first stories I ever wrote sounded exactly like stories from the Panchatantra and Enid Blyton’s school stories. As a writer of fantasy fiction, it took years to not blindly copy Pratchett or write really terrible poems like Tolkien (I instead imitated Vikram Seth’s Beastly Tales). After what felt like a million or so words (handwritten and typed) – I finally started feeling confident in my ability to be “original”.

But generative AI worries me. Could an LLM (large language model) be trained on fiction written by some of the best writers in the world and “create” something new – something not derivative and perhaps original? What about music and art? If you are an artist, could generative AI help anyone produce art in your signature style? Could this potentially lead to a loss of income for artists?

Multiple conversations later I realised that other people had similar questions about the economics, ethics and the evolving policies around this field as well.

This needed a deep dive. I had to learn and then I had to distil down my thoughts. And then share. Think of the rest of this essay as a trigger. It might appeal to you as someone who is an artist trying to make sense of this new “dynamite” phase. It might appeal to your curiosity as a technologist and product builder. Heck – you might be an AI expert who wants to see how the rest of the world reacts to what you are building. Or – you are most likely a bystander wondering what all this ruckus is really about.

Fasten your seatbelts. We are diving right in.

I. A quick primer on generative AI, LLMs, diffusion models and where we are currently

This portion of the essay was written using a bunch of generative text AI tools available in the market. It just made sense to use the power of Large Language Models (LLMs) to write about their capabilities. I have edited heavily, checked for accuracy and rewritten parts wherever the flow did not feel right. But overall – I find these tools (especially ones that summarize larger pieces of text) to be very worthy writing assistants. Good writing is not going anywhere – it will become stronger and more widespread as a result of these evolving technologies.

If you have a good understanding of generative AI, skip directly to the next section on copyright law.

Generative AI

Generative AI is a type of artificial intelligence that can create new content, such as text, images, and music. It does this by using a neural network model to learn the patterns and relationships in the content it is trained on. Once it has learned these patterns, it can then use them to create new content that is similar to the content it was trained on.

There are three broad terms to understand here:

Data
Model
Application

Data is an essential component of artificial intelligence. It comprises a collection of facts, figures, or any other information that can be in the form of text, images, or sounds. One of the primary uses of data in AI is to train models, which are mathematical representations of real-world phenomena. These models are then used to make predictions or decisions based on the data they have been trained on. For instance, a model trained on a large dataset of satellite images can predict weather patterns with a decent degree of accuracy.

Applications are the way in which humans interact with AI models and their underlying data sets to solve problems and get things done. Applications range from speech recognition software that can process natural language to self-driving cars that use computer vision to navigate roads. In recent years, AI has been applied to a wide range of industries, including finance, healthcare, and transportation, to name a few. As such, it has become an increasingly important area of research and development.

Data —> Model —> Application

Example:

Data: text sources on the internet

Model: OpenAI’s GPT-3

Application: ChatGPT

Data is used to train models by providing them with labelled examples. For example, a classifier (a type of model) could be trained on a dataset of emails, where each email is labelled as spam or not spam. The model would then learn to predict the class label for a given input email.

Models are evaluated based on their performance on a test set. The test set is a set of data that is not used to train the model. This allows us to get an unbiased estimate of the model’s performance.

LLMs and Natural Language Processing

One of the big problems that these models are trying to solve is in understanding natural language. Enter LLMs. Large language models (LLMs) are a type of artificial intelligence that can generate text, translate languages, write different kinds of creative content, and answer your questions in an informative way, even if they are open-ended, challenging, or strange.

Large language models are tens of gigabytes in size and trained on enormous amounts of text data. They can be used for zero-shot scenarios or few-shot scenarios where little domain-[tailored] training data is available, and their performance continues to scale as more parameters are added to the model.
Fine-tuned models leverage existing large language models but require less time/compute power to train or run; they also don’t need as much data as large language models do. Fine-tuned models are used for specific purposes: like generating code. [Example: GitHub Copilot is powered by the OpenAI Codex, which is a modified, production version of an LLM: GPT-3. The Codex model is additionally trained on gigabytes of source code in a dozen programming languages.]

The most popular LLM right now is GPT-3. It is a third-generation Generative Pre-trained Transformer, a neural network machine learning model developed by OpenAI trained using internet data to generate any type of text.

It has over 175 billion machine learning parameters and can be used for natural language generation and processing tasks such as creating articles, poetry, stories, news reports and dialogue; generating summarizations; programming code snippets; finding bugs in existing code; mocking up websites; translating between languages or performing sentiment analysis.
Benefits include its task agnosticism (can perform many different tasks without fine-tuning) and lightweight nature which allows it to run on consumer laptops or smartphones.
Limitations/risks are pre-training (no long-term memory), limited input size & slow inference time as well as mimicry leading to issues with factual accuracy due to bias from the underlying training data set.

The rapid development of machine learning and AI technologies is leading to a new era of natural language processing (NLP). Large-scale language models, such as Google’s BERT and OpenAI’s GPT-3, are increasingly being used to power a variety of applications including search engine results, conversational interfaces, and text generation. These models are trained on large datasets and can easily process huge volumes of text, understanding the context and intent behind words. This enables them to generate more accurate results than traditional methods of NLP.

But better models are just around the corner.

Image Generation

Another exciting field in generative AI is image generation. Generative AI models, such as DALL-E, use a technique called diffusion modelling to produce images.

Diffusion models work by first ruining the image with random noise before attempting to rebuild it through a series of steps that reduce noise while increasing its meaning.
The model is trained by adjusting parameters within neural networks in order for it to take meaningless images and evolve them into something meaningful.

Predicting the performance or understanding the workings of generative AI models is difficult. Their outputs can only be judged based on whether they look good or not.

These models can be considered highly capable imitators, like a smart parrot, since they do not truly understand language or comprehend real landscapes. Despite this, they create realistic-looking outputs from statistical mashups alone.

II. What is copyright law and what are its tenets?

Copyright and intellectual property law are not uniform across the world. Each country has its own implementation. The following are the most common aspects across the world.

Copyright law is a form of intellectual property law that protects original works of authorship, including literary, dramatic, musical, and artistic works, such as poetry, novels, movies, songs, computer software, and architecture. Copyright law gives the author of a work the exclusive right to reproduce, distribute, and perform the work, as well as to create derivative works based on the work.

The most important aspects of copyright law are:

The idea/expression dichotomy: Copyright law protects the expression of an idea, but not the idea itself. This means that anyone can use the same idea as someone else, but they cannot copy the expression of that idea.
The fair use doctrine: The fair use doctrine allows for the use of copyrighted material without permission from the copyright holder in certain limited circumstances, such as for purposes of criticism, commentary, or education.
The first sale doctrine: The first sale doctrine allows for the resale of copyrighted materials without permission from the copyright holder.
The term of copyright: The term of copyright varies depending on the type of work and the date of creation. For most works, the term of copyright is the author’s life plus 70 years.
The copyright notice: A copyright notice is not required to obtain copyright protection, but it can provide notice to potential infringers and may be required in some countries as a condition of protection.
Registration: Registration of a copyright is not required, but it can provide additional benefits, such as the ability to file a lawsuit for infringement.

Fair use is a legal doctrine that permits the use of copyrighted material without permission in certain circumstances, promoting freedom of expression and creativity.

The doctrine of fair use is based on the idea that copyright law should not be used to stifle creativity, and that in some cases, it is in the public interest to allow limited use of copyrighted material without permission from the copyright holder.

To determine whether a use is fair, courts consider four factors:

The purpose and character of the use, including whether it is commercial or non-commercial, and whether it is transformative.
The nature of the copyrighted work.
The amount and substantiality of the portion of the work that is used.
The effect of the use on the potential market for or value of the copyrighted work.

No single factor determines whether a use is fair. Courts must consider all factors and weigh them in light of each case’s specific facts.

Some instances automatically assume fair use. For example, in the United States, it’s fair use to quote a copyrighted work in a review or criticism of that work. Similarly, using copyrighted work for news reporting is fair use. However, in other cases, a use may not be fair, even if it falls under one of the above categories. For example, copying an entire copyrighted work and distributing it for free, even if for educational purposes, isn’t fair use.

In general, the more transformative a use is, the more likely it is to be considered fair use. Transformative uses add something new and original to the copyrighted work, such as by commenting on it, criticizing it, or parodying it. Conversely, uses that aren’t transformative, such as copying the work exactly as it is, are less likely to be considered fair use.

It’s important to note that it is the user who must prove that their use of copyrighted material falls within the fair use doctrine.

III. How will copyright clash with training data sets and the output of generative AI?

As AI models and applications become more and more sophisticated, the potential for them to be used for creative expression is increasing. With this potential comes the question: under what conditions is the usage of images and text as training data, copyrighted or not, fair or not? For example, if a large language model is trained on a dataset of works of long-dead authors, is that considered fair usage? What about living authors and artists?

The idea of fair usage states that transformation of the original material must be sufficient. How do we decide if the result is sufficiently different from the original for it to qualify as fair usage when it comes to generative art? How do we measure the substantiality of the original work that is used as training data by a model? While ideas can be copied, expression cannot, and this is where the world of art, writing, and creation will clash with generative AI and copyright law.

There’s another important question to consider: how will attribution and monetization work? Copyright tags – like Creative Commons – used to state whether a piece of work requires/ does not require attribution and whether it can/ cannot be used for commercial purposes. But how can we ensure that original artists and creators are attributed and receive payments when their work is used as training data to generate new pieces of commercial art?

This is already an existing common practice (even though not perfectly implemented) in the world of music sampling. Sampling refers to the act of taking a portion of a sound recording and reusing it by incorporating it into an audio-only recording of a new song. This is common in genres in which artists will typically use pre-recorded music and sounds to create new work (hip hop, EDM, etc). If someone wants to sample a sound recording, they must obtain permission from both the copyright owner of the song (the music publisher(s)) and the copyright owner of the particular recording of that song (the record label) to avoid copyright infringement.

It will be important for creators and technology companies to work together to find a solution that ensures copyright law is respected while still allowing for the development of generative AI technology.

We are already seeing the opening salvo by copyright holders against AI model creators. Getty Images recently filed a lawsuit against Stability AI for apparently using 12 million of their images to train their image generation model.

From the original complaint filed by Getty Images against Stability AI

IV. What could be the way forward?

The intersection of policy, copyright law, creator economy and AI will need new champions. Policymakers have traditionally been incredibly weak in understanding evolving technologies. Old-school creators mistrust technologists and technology. And technologists do inadvertently blow things to smithereens before moving humanity forward.

The way forward will require us to solve the following three problems (and fairly quickly):

Educating Creators: It is going to become imperative for creators to understand how AI models could use their work as training data. As more technologists start working in AI, explaining its nuances to people in other fields will become extremely important.
Participation with the Right Terms: Creators will also need tools which allow them to tag and mark their work as AI-ready:
- Can/ cannot be used as training data for AI modelsAttribution required/ not requiredCan be used for commercial/non-commercial purposesWay to pay the original creators
A bunch of people have been thinking along these lines, but a real solution is still a distant dream.

"ai.txt"

usage: allowed-all| allowed-non-commercial
attribution: always|never|when-commercial
attribution-id: {DNS, email-id, name}
payment: {eth-wallet, key}

and you can stick a compressed form on IG/Tiktok/Deviant/etc

(based on an idea from @cdixon)
— Sriram Krishnan – sriramk.eth (@sriramk) February 7, 2023

Generating Value for Creators:
1. Attribution – Attributing original creators whose work was used while generating a new piece of work will be a challenge. Stable Attribution is one example of a tool that helps find the human creators behind an AI-generated image.
2. Proof of Ownership – Proof of ownership or original creation is an important aspect to consider, especially when it comes to digital content. One possible solution to this issue could be a blockchain-based system that verifies the authenticity and uniqueness of each piece of content. This would not only help with attribution and copyright protection but could also open up new opportunities for monetization and revenue generation in a digital marketplace. By leveraging the transparent and decentralized nature of blockchain technology, creators and owners of digital content could have greater control and ownership over their work, leading to a more fair and sustainable online ecosystem for all parties involved.
3. Monetization: It will also be important for technology companies that are developing these AI models and applications to figure out how they ensure that creators of original work don’t lose out on potential income and these applications actually help everyone become better creators.

As a technologist, this is a really exciting time. The number of amazing and genuinely helpful applications being built in generative AI is inspiring. But I’m also cautious of the current “dynamite” phase and don’t want old-school creators to be left behind. My hope is that we are not only able to bring them along but actually make the future more rewarding for them as well as make the creator ecosystem more inclusive.

PS (or why this essay title): In AI, a hallucination or artificial hallucination is a confident response by an AI that does not seem to be justified by its training data. Basically, it sometimes makes up confident gibberish. Like insisting that Earth is the fourth planet from the sun in the solar system.

As I have said numerous times over the last few months, hallucinations are an inevitable property of auto-regressive LLMs.
That's not a major problem if you use them as writing aids or for entertainment purposes.
Making them factual and controllable will require a major redesign.
— Yann LeCun (@ylecun) February 25, 2023

Disclaimer: This is a personal opinion piece. It does not represent the views of my employer: Google. I do not work in any of the AI teams at Google.

I. A quick primer on generative AI, LLMs, diffusion models and where we are currently

II. What is copyright law and what are its tenets?

III. How will copyright clash with training data sets and the output of generative AI?

IV. What could be the way forward?

Share this: