#img2img improving

81 messages · Page 1 of 1 (latest)

mystic rover
#

Hi!
I have a very bad UX with img2img in the current gui.

For now I need to load an image from the left menu.
But if I have previously generated a picture, it is in the center area and the «generate» button is above it.
So when I load a picture, I often get confused about which one will be processed. Because I have to click the «generate» button elsewhere, they don't seem related.

I think it's better to either separate img2img completely into its own tab, or embed it directly into the image area in the center, so I can drag images from the right area or from a folder and just replace the active dragged one, with understanding that img2img now works, not txt2img.
But it's better to separate the whole thing so as not to overload the interface.

The same goes for the realsrgan and gfpgan options - they are available at the top in the central area, but are also present on the left. At the same time, they can be disabled on the left, but still work. But if they are disabled on the left, they, if I remember correctly, have no effect on processing when the image is loaded in img2img. That's all a bit confusing.

**The main question is: do I need to describe algorithms (flows, mindmaps) for the main functions? How are they related, what affects what, how it all must work? Or do we solve these problems with modularity and nodes, and current ui is too raw? **

@merry ivy @vestal flame

merry ivy
#

I think there's meaningful UI/UX work to be done on the existing UI - It's functional but not streamlined for common workflows - I think, realistically, we don't know "who we want to be for", and are just collecting all of the tools/capabilities right now

#

The current UI requires a level of familiarity with the different functions that isn't exactly "noob friendly"

#

I won't have time until this weekend (Damn you, day job!) to do this, but I want to put together some abstract flows/mindmaps as well - I'd love to see what you come up with

mystic rover
#

Yeah, I still don’t understand it clear. But with all the features we have it may be a pro tool. Maybe with a simplified gui for beginners

vestal flame
#

It is indeed confusing. The ESRGAN and GFPGAN toggles on the left are to enable and disable those as extra post-generation steps which are taken automatically after generating an image. This is just how the old UI was built - you couldn't just upscale an image independently; you could only add upscale as a post-generation step.

The buttons above the current image are to run ESRGAN/GFPGAN independently on the image.

For both post-generation and independent running of ESRGAN/GFPGAN, we need to allow configuration of ESRGAN/GFPGAN, so the settings are always on the left.

I'm not sure about separating img2img into a separate tab, because you still need to provide a prompt.

#

What if we have two tabs - Generation and Postprocessing. On the Generation tab, there are toggles for ESRGAN and GFPGAN (and any others we add), wihch enable running those steps immediately after generation. The settings are not displayed there, though.

On the postprocessing tab, we have all the settings related to those postprocessing methods.

vestal flame
#

Main view

vestal flame
#

Image detail view

random scaffold
random scaffold
#

Not sure if it can help with UI design, but that's how I rearranged my stuff yesterday.
(And yes, I just renamed my "advanced options" to "post processing". :D)

White bars = always applied
Gold dots = The image is re-done with

I sorted the img2img with "variations", because I think that's kiiind of how it is used...? But I also kinda like adding it to the prompt.

vestal flame
#

Great idea with the sliders and colors to indicate if it's always applied vs postprocessed

vestal flame
hollow robin
#

I just want to thank everyone for the helpful discussion here. Good UX design is an under-appreciated art. Most people know good UX when they see it, but have difficulty enunciating it from scratch. I think @psychedelicious's initial design is a great foundation for further improvements. Keep the ideas coming, but be aware that @psychedelicious is doing all the work here; we don't want him to burn out!

merry ivy
#

I'm going to try to chip in where i can as a nooby 🙂

mystic rover
#

An example of a pro-use (and very similar to my cases) https://jamesoclaire.com/2022/10/03/generating-ads-with-stable-diffusion/ — it clearly shows which features need polishing for a «daily use». Img2img + inpainting/outpainting. It seems to me as a primitive photo editor based on sd functions. With basic features like simple painting, resize, select, crop, layers. At this point its better to use a plugin at some external software or try to implement it in our gui?

merry ivy
#

This quote is 🔥

#

“Image generation needs more modular components for artists and marketers alike to manipulate. For example, outputting backgrounds, characters and objects as separate images and then composing them. This would allow flexibility later when working with the image. The landscape of image generation has been changing quickly and more and more the tools built around Stable Diffusion. I foresee further chaining together multiple AI algorithms and editing tools to enable more diverse use cases. I think what may come out of these are new specialized roles for artists who can create images from these tool sets.”

#

This is the concept library and nodes!!!

#

Well maybe an advanced version but still

mystic rover
mystic rover
merry ivy
#

Some inspiration to take there

mystic rover
#

yeah, I will do some concepts today

#

I like google autodraw too https://www.autodraw.com
in terms of UX

Fast drawing for everyone. AutoDraw pairs machine learning with drawings from talented artists to help you draw stuff fast.

mystic rover
#

So, I started collecting different implementations of image editors in search of specific interesting features that we can build into our own. Or that could be useful for us in general.

  1. I've found an interesting node-based editor on the github (visually quite like a blender) https://github.com/GimelStudio/GimelStudio (since our UI is also node-based).

  2. Cutting and pasting image parts without edges https://github.com/vittorione94/Poisson-Image-Editing (may be useful for merging objects from two generations)

  3. The weird thing with face detection https://github.com/packetsss/Image-Editor (idk do we need this or not)

  4. I find the mirror-painting feature very useful - try https://openprocessing.org/user/34627. There's also (or was) an awesome Alchemy http://al.chemy.org/features/ which is a weird drawing tool. Imagine how easy it would be to draw faces, architecture, other objects with symmetry for img2img.

  5. Not sure if this would be useful for everyone, but as an experiment - pixel art?

  6. I like the simplicity of the interface https://www.autodraw.com/

What I want from editor:

  1. All possible filters 😉
    Basic are: blur, sharpen, negative, grayscale, contrast, brightness, hue, saturation.

Tools: layers (three are enough); transparency; painting with different colors and brush sizes, selection, cropping, erasing, filling; rotation, flip horizontally and vertically, history.

Many effects are available from imagemagick https://imagemagick.org/script/examples.php.
There's also a great tools set at https://pillow.readthedocs.io/en/stable/reference/ImageFilter.html

Some of the interface and tools are at https://github.com/FahimF/sd-gui

@vestal flame @merry ivy @ocean canyon

GitHub

Non-destructive, node based 2D image graphics editor - GitHub - GimelStudio/GimelStudio: Non-destructive, node based 2D image graphics editor

GitHub

Implementation of Poisson Image Editing (Patrick Perez et al. Microsoft Research UK) - GitHub - vittorione94/Poisson-Image-Editing: Implementation of Poisson Image Editing (Patrick Perez et al. Mic...

GitHub

A small and easy-to-use image editor GUI using PyQt5 and Cv2 - GitHub - packetsss/Image-Editor: A small and easy-to-use image editor GUI using PyQt5 and Cv2

ocean canyon
#

How much of these help with img2img? I feel like building an API to make it easy to integrate into other tools (like Photoshop or Photopea) is probably the right move.

merry ivy
#

Generally agree. Think maintaining a photo-editor in-app would be a distraction from core

#

I've been looking into Photoshop plugin dev, and it seems relatively straightforward... And I say that without having started working on it and only working through a "scale layer up/down" plugin example, but still - Think this is more in line of our "pro tools" ambition

ocean canyon
#

I guess Photopea and Photoshop use very similar plugin structure, and it didn't look terrible (though I didn't want to play with JavaScript that much =/)

merry ivy
#

yeah seems like you would just pass an input/command from the plugin to the invoke backend, and then push whatever comes back to a new layer OR current selection

mystic rover
#

You’re right. I wrote this as a set of ideas, not a guide to action. These are, of course, fantasies about the perfect product (and some things that are easy to add, though they're not that necessary).

I think some of the basic tools are worth adding, because Photoshop is too clumsy for such purposes. Also, I wouldn't want to interrupt my work in invoke to, for example, cut out part of a picture and paste it into another, then start generating.

We also already see similar primitive editors in other SD GUIs, with cut, erase and basic drawing. They just barely work on macs. So if at least some of the above would work as a plugin, it would be very handy.

vestal flame
#

Can you drag an image from browser to photoshop? The Ctrl A - Ctrl C in photoshop. We can implement Ctrl V in the web ui to paste it in. And have a primitive editor for basic tasks. This feels like a happy medium

#

Plug-ins do photoshop etc seem kinda tedious

merry ivy
#

You can right click copy and paste to photoshop relatively easily

#

The real “wow” use case is selecting a nonstandard area and having SD generate in that area

vestal flame
#

Ya we’ll get that working in app, not too difficult we just need to get the base UI figured out

mystic rover
mystic rover
#

btw, why even need to have photoshop or anything external to compile a few images and run generation?

In general, I want to:

  1. Generate several images.
  2. Cut something from image 2 and paste to image 1, select image 3, cut from it, paste again to image 1.
    2.1. Optional — change the colors of that parts a bit, shift hue or desaturate them, add more contrast.
  3. Add some drawing or paste image from a clipboard. Generate from result.
  4. Select some area and regenerate selected part, like dalle-e does.
  5. Optional — outpaint and steps 1-3 again.

And after that I will open a Ps or Affinity, and add some text, some noise and other effects.

That's a real use case. It's like a collage: cut, paste, move to the right place to set a composition. Or draw by hand, then generate, or select and regenerate. Right now I must generate, then compile images in external editor, then generate again in Invoke. Yes, there is a Ps plugin. But slow, unstable, idk if it works on my mac, and why I need a Ps/gimp/krita for that?

And, as a user, I don't wanna think "do I need inpainting or outpainting", or something else. It would be great if we can merge them into one GUI as a tools.

merry ivy
#

I can see the argument for simple photo editing perhaps - I just think there’s a line of “too much of a photo editor to maintain”

#

Maybe we just bake GIMP into the app? Lol

hollow robin
#

Has anyone tried cutting out a piece of an image (say a person), pasting it into another image, and then applying img2img to merge them organically?

merry ivy
#

Yep!

#

This is a standard part of my workflow

#

Another example of how img2img can be used - my wife wanted a woman breaking out of a statue (literally shedding the stone) and so I painted cracks and did some hue adjustments to get SD to pick up flesh vs stone after a few iterations of feeding forward

#

Higher strength needed to get the “merge” to look decent (less necessary if you do a more natural composition job in photo editor)

#

A more amusing output meant to convey a buff bearded chef fighting an evil Bobby Kotick in a lightsaber battle. A few trips to PS and through img2img on this one 🙂

mystic rover
#

Basic instruments:

Tools
Select area: rectangle, circle, free
Cut to clipboard
Paste to layer
Delete layer
Rotate layer
Drag layer
Undo/Redo — maybe up to 5-10 steps
Zoom — each image area separately

Draw
Square, circle, triangle, free
Brush: circle, square
Brush Size: from 1px to 500px or so

Inpaint
Select area, hit button, add prompt in modal windows
Outpaint
Select area, hit button, add prompt in modal window (I still doubt about separate prompt area and modal windows, but it may be better than do it all from the same text area)

Filters
HUE
Saturation
Contrast
Gaussian Blur — for effects and experiments
Add noise — may be useful for more random generations?
Sharpen — to add some details between generations

two spaces: left for selected donor/reference image, right for result (just like at new img2img)

first sketch

ocean canyon
#

Is there an open source web image editor we could just use or provide integration for? Building and maintaining an image editor isn't trivial.

mystic rover
ocean canyon
#

The interface for selection, drawing, moving, rotating, etc. are the hard part. Even more so with layers.

merry ivy
#

poking around to see whats out there

#

doesnt seem to have the type of masking/brush selection support we'd want though.

mystic rover
#

I believe, masking is more or less the same

#

features:
Free drawing
Line drawing

Mask Filter

merry ivy
#

Yeah, but unclear that it would be trivial to turn the brush into a masking tool

#

It may be - just saying it's not "out of the box"

vestal flame
#

our "masking" is simply erasing parts of the image

#

we've been looking around at ready-made libraries for the image editor (and inpainting/outpainting). reviewing this one

#

this is probably the best one Ive seen

hollow robin
#

For good results, the masking operation should set the alpha of the region to some partially transparent value. If it erases the RGB values then the inpaint operation won't work properly.

#

As far as I can see, ToastUI does not offer an eraser option and does not support alpha. There are a couple of feature requests I found.

vestal flame
#

I think we could add those missing pieces, but it may be tricky to add infinite canvas (for outpainting)

#

I'm going to have a go at making my own

hollow robin
#

It's tk based, and so the underlying library supports transparency. Probably not too hard to implement an eraser. I do not recall if you can dynamically resize a tk canvas.

#

Actually you can resize a tk canvas. So you can make the image a widget that is embedded in a larger canvas, and then click on the image's edges in order to pop out a rectangle for outpainting.

#

Rolling your own might not be too hard if the primary goal is to support inpainting and outpainting. I used to write Tk applications, and it was a real breeze to do basic drawing and erasing.

#

The bigger challenge is supporting interactive drawing where someone sketches out an image and then uses img2img to elaborate on it. There was a "sketch" PR at the very beginning of this project that had integrated the CLI with a desktop sketching tool.

vestal flame
#

Tk? I think the toastui editor is pure JavaScript

#

We will be using HTML canvas for the editing - it’s not difficult to do everything that’s been requested there including a sketch pad.

hollow robin
#

You're right. It's pure HTML

#

and javascript.

vestal flame
#

I am on vacation and don’t have much coding time for the next few weeks else I’d be working on it now

hollow robin
#

I followed an irrelevant link and went down a rabbit hole.

vestal flame
#

Haha whoops