Midjourney
Introduction
Introduction.
Midjourney is a generative artificial intelligence program and service created and hosted by San Francisco-based independent research lab Midjourney, Inc. It was founded by David Holz, previously co-founder of Leap Motion.
Midjourney generates images from natural language descriptions, called “prompts”.
Users create artwork with Midjourney using Discord bot commands.
What sets Midjourney apart from other text-to-image models is its ability to create highly detailed, precise, and defined images. These images can have dimensions of up to 1,792 x 1,024 pixels. Descriptive text is needed to instruct the AI to create the image. The more information provided in the description, the more accurate the resulting image will be.
Model versions.
The company has been working on improving its algorithms, releasing new model versions every few months. Version 2 of their algorithm was launched in April 2022 and version 3 on July 25. On November 5, 2022, the alpha iteration of version 4 was released to users and on March 15, 2023, the alpha iteration of version 5 was released. The 5.1 model is more ‘opinionated’ than version 5, applying more of its own stylization to images, while the 5.1 RAW model adds improvement while working better with more literal prompts. After version 5.2 is released with an increasingly better image quality.
Midjourney is currently only accessible through a Discord bot on their official Discord server, by direct messaging the bot, or by inviting the bot to a third party server. To generate images, users use the /imagine command and type in a prompt; the bot then returns a set of four images. Users may then choose which images they want to Upscale or make Variations.
How it works.
Midjourney is an example of generative AI that specializes in creating images based on textual prompts. It is a product of the evolving field of Diffusion models.
Diffusion models are a type of generative model that have gained significant popularity in recent years due to their ability to generate high-quality data, such as images. They are fundamentally different from other generative methods and are based on the idea of decomposing the image generation process into many small “denoising” steps.
Here’s a step-by-step explanation of how they work:
Forward Process (Diffusion Process): This process involves gradually adding Gaussian noise to the input data through a series of steps12. This is also known as the diffusion process. The input data is progressively noised, transforming it into a latent variable2.
Reverse Process (Reverse Diffusion Process): After the forward process, a neural network is trained to recover the original data by reversing the noising process. This is also known as the reverse diffusion process. By being able to model this reverse process, we can generate new data.
Midjourney Guide
Created by Midjourney. Image by jcmm.art
Midjourney Guide.
Midjourney is currently only accessible through a Discord bot on their official Discord server, by directly messaging the bot, or by inviting the bot to a third party server.
To access the Midjourney server you need to have a Discord account.
L35 already has a Discord account and a Midjourney Standard Subscription.
To access the L35 server on Discord, go to the Midjourney website https://www.midjourney.com/home/
This screen may show to you.
A new window will open for you to Sign in.
If you don’t have a Discord account you must click on the Register link and create one.
If you are already registered, just fill in the data in the pop-up window and you will access to your Discord account.
After clicking on Authorize, you will enter Discord, where the L35 server will appear.
From now on you have access to the Midjourney bot inside your Server (it is recommended to create a bookmark in your browser for later access).
- On the left are the servers you have access to.
- In the adjacent column are the characteristics of your server, with the channels created on it.
- The main area. This is where you will write your prompts and where Midjourney will generate the images.
Discord Commands.
To interact with the Midjourney bot you must use one of the slash commands.
Commands are used to create images, change default settings, monitor user info, and perform other helpful tasks.
The list of available slash commands pop up when you type ‘/‘.
Generate images with the Midjourney bot.
All prompts to generate images with Midjourney bot should start by typing “/imagine” + “Enter” in the message field.
This will cause the request for your prompt to appear: ‘/imagine prompt:‘
You can also select the /imagine command from the list of available slash commands that pop up when you type ‘/’.
Type a description of the image you want to create in the prompt field. Send your message (known as a Prompt).
After submitting your text prompt, the Midjourney Bot processes your request, creating a grid of four image options.
Below the generated images, buttons will appear to modify or redo the generation.
Upscale Buttons
U1 U2 U3 U4 buttons upscale an image generating a larger version of the selected image and adding more details.
Redo Button
The redo (re-roll) button reruns a job. In this case, it would rerun the original prompt producing a new grid of images. If “Remix mode” mode is active, you can re-write the prompt.
Variation Buttons
V1 V2 V3 V4 buttons create incremental variations of the selected grid image. Creating a variation generates a new image grid similar to the chosen image’s overall style and composition.
Image edition.
When an image is scaled we get a series of options to edit it.
Vary Buttons:
- Vary (Strong) button create a grid of four strong variations of the image.
- Vary (Subtle) button create a grid of four subtle variations of the image.
- Vary (Region) button allows you to edit certain areas of the image.
Zoom and Pan Buttons.
Creative Prompts.
A creative prompt for Midjourney should be well-structured and provide clear guidance to the AI to generate the desired artistic output.
The following is an outline for composing prompts:
- [PREFIX]: Artistic Medium & Image Style.
- [SUBJECT or CONTENT]: Main Idea or Subject & Architecture Style & Materials, Finishes and Colors & Setting and Landscaping & Time Period and Technology Level & Unique Features or Challenges.
- [SUFFIX]: Lighting and Atmosphere & Emotional Tone or Mood & Camera and Depth of Field & Perspective and Composition & References and Inspirations.
- [PARAMETERS] Parameters are always added to the end of a prompt. You can add multiple parameters to each prompt. Check the list of parameters below.
[PREFIX]
- Medium & Style: Start by referring to the artistic medium used (e.g., oil painting, watercolor, photography) and the specific style or approach used in creating visual art.
- Image Style: Specify the overall artistic style you want the image to follow. This could include terms like minimalistic, abstract, photorealistic, impressionistic, etc. Describing the style helps the model understand the visual treatment you’re aiming for.
[SUBJECT or CONTENT]
- Main Idea or Subject: Start with a concise statement or description of the main idea or subject you want the AI to focus on. This sets the central theme of the artwork.
- Architecture Style: Describe the architectural style (modern, classical, futuristic, Bauhaus) and any specific details you want in the image.
- Materials and Textures: Detail the materials used in the building’s construction, such as marble, glass, wood, etc., along with their textures and reflective properties.
- Setting and Landscaping: Specify the location, provide details about the landscaping elements around the building, like gardens, pathways, water features, and any outdoor amenities.
- Time Period and Technology Level: Specify the time period the architecture is inspired by, whether it’s contemporary, historical, or futuristic. Mention the level of technology and innovation present in the design.
- Unique Features or Challenges: If there are any specific unique features or challenges related to the architecture, mention them. For instance, if the building needs to be environmentally sustainable or incorporate a specific cultural motif.
[SUFFIX]
- Lighting and Atmosphere: Provide information about the lighting scheme, sources of light, and how light interacts with surfaces to create realistic shadows and highlights.
- Style and Mood: Specify the style or artistic approach you want for the image. Describe the mood, atmosphere, or emotions you want the artwork to convey. Use adjectives that evoke the desired feelings.
- Camera and Depth of Field: Define the camera perspective and depth of field in the generated image. Describe the angle and height of the camera (e.g., bird’s eye view, ground level, interior view). Provide information about the depth of field, whether you want a sharp focus on the building or a more blurred background, which adds realism and aesthetic depth to the scene.
- Perspective and Composition: Specify the viewing angle, focal points, and artistic composition preferences to influence the visual storytelling of the image.
- References and Inspirations: Provide references to existing architectural designs, artworks, or images that capture the essence of what you’re looking for. Mention architects, photographers or artists whose style you want to include in your image. This gives the model visual cues to generate more accurate results.
Below is a gallery of images with the different types of elements, so that you can use them in your prompts.
Parameters are options added to a prompt that change how an image generates.
Click on the buttons below to copy the parameter to the prompt area. To get help on the parameter hover the mouse over it.
[PREFIX]
[SUBJECT]
[SUFIX]
[PARAMETERS]