Two AIs Better Than One? DALL E + ChatGPT = 🔥
Does a 1000 word description make a better picture?
So I’ve been thinking about something.
If a picture is worth a thousand words, does a thousand-words description make a better picture?
At this point, everyone has heard about DALLE-2. It’s an AI art generator that will take a human-written description of a picture and attempt to create it. If you’ve tried it out, one thing you learn pretty quickly is that the more detailed a description you write, the more accurate the generated photos become.
Now let's shift gears for just a second and talk about another AI tool called chat GPT. Now, unlike DALLE-2, Chat GPT is pretty well known for taking small descriptions of what you want to be written and turning it into a much more detailed and sometimes better-written version of essentially what you started.
So, that got me thinking. What if we combined the two?
What if I wrote a short description of a photo I wanted DALLE-2 to create and instead of just giving it that short description, I take the short description and give it to chat GPT? Then, ask chat GPT to write a much longer and more detailed description of what I want then gave that to DALLE-2.
Will it make a better photo? Well, i guess there’s only one way to find out.
First, let's think of three relatively short, but descriptive uh descriptions of photos I want to be created.
How about these three:
Create a photo-realistic image of a blonde man with a shocked face looking at a computer screen. The screen has crazy art on it.
Create a photo of a dog flying over a barn with horses in the background.
Create a photo of a horse jumping a fence. the fence is on fire.
Now, can chat GPT make a better description and in turn a better photo generated by DALLE-2? Just to make sure we are doing this fairly, I am going to modify all three photo descriptions the same way so chat GPT will do what I want. Let's take the pre-written description we have to DALLE-2 and add just a couple of sentences:
“Write me a 300-character description of a photo. describe a photo in high detail where the basic information is this:”
These are the results of taking the shorter photo descriptions and giving them to Chat GPT. Note: DALLE has a character limit so this is as long as I could make them.
The photo depicts a blonde man with a look of shock and disbelief on his face as he gazes at a computer screen. The screen is filled with vibrant and abstract art, consisting of swirling patterns and bright colors. The man’s eyes are wide open and his mouth is slightly agape. The image is highly realistic, with the man’s features and expressions appearing lifelike and detailed. The computer screen is also rendered with great accuracy, showcasing the intricate and complex artwork displayed on it
The photo depicts a majestic dog soaring through the air, its ears flapping in the wind. The dog’s fur appears to be a rich brown color, and it seems to be in mid-leap. In the background, a red barn can be seen, with horses grazing peacefully in the field nearby. The sky is a clear blue, and the sun is shining down, casting a warm glow over the entire scene. The overall effect is one of freedom an
A majestic horse jumps over a blazing fence, its powerful muscles rippling as it soars through the air. The fire casts a warm glow on the horse’s glossy coat, highlighting its strength and beauty. The horse’s eyes are focused, determined as it clears the fence with ease. The flames of the burning fence add a sense of danger and excitement to the scene. It is a beautiful and powerful image of a horse in motion.
Well, they are certainly longer that’s for sure. And they seem slightly more detailed. it certainly used its imagination.
Now for the big show. if I give these maximum Length descriptions to DALLE-2 will it make better photos?
The first short description was: “create a photo realistic image of a blonde man with a shocked face looking at a computer screen. the screen has crazy art on it”
These are the original set of photos created by DALLE 2:
This is the long description generated by Chat GPT: “The photo depicts a blonde man with a look of shock and disbelief on his face as he gazes at a computer screen. The screen is filled with vibrant and abstract art, consisting of swirling patterns and bright colors. The man’s eyes are wide open and his mouth is slightly agape. The image is highly realistic, with the man’s features and expressions appearing lifelike and detailed. The computer screen is also rendered with great accuracy, showcasing the intricate and complex artwork displayed on it”
And these are the photos created by the longer description:
The second short description was: “create a photo of a dog flying over a barn with horses in the background”
These are the original set of photos created by DALLE 2:
This is the long description generated by Chat GPT: “The photo depicts a majestic dog soaring through the air, its ears flapping in the wind. The dog’s fur appears to be a rich brown color, and it seems to be in mid-leap. In the background, a red barn can be seen, with horses grazing peacefully in the field nearby. The sky is a clear blue, and the sun is shining down, casting a warm glow over the entire scene. The overall effect is one of freedom an”
And these are the photos created by the longer description:
The final short description was: “create a photo of a horse jumping a fence. the fence is on fire”
These are the original set of photos created by DALLE 2:
This is the long description generated by Chat GPT: “A majestic horse jumps over a blazing fence, its powerful muscles rippling as it soars through the air. The fire casts a warm glow on the horse’s glossy coat, highlighting its strength and beauty. The horse’s eyes are focused, determined as it clears the fence with ease. The flames of the burning fence add a sense of danger and excitement to the scene. It is a beautiful and powerful image of a horse in motion.”
And these are the photos created by the longer description:
Well, what do you think? Do you think that compounding the AI actually helped and produced a better product? Until next time, happy coding!