Input a prompt and an image to generate a mask representing areas of the image matched by the prompt. This mask can be used in the Create Denoise Mask node or many other applications.
The raw output of Clipseg (in this case, clipseg-rd64-refined) is a low resolution grayscale image, so this node includes options to apply smoothing and thresholding to yield a pure black/white image at full size. There are additional options to expand or contract the mask, apply a blur at the end, and invert the mask between black on white and white on black.
I have included two files: clipseg.py, which contains the Text to Mask (Clipseg) node, and a second file, clipseg_adv.py, containing an advanced version of the node and some other nodes that may be used with it to get the same functionality as found in the standard node, or in other creative combinations.
The extra nodes in clipseg_adv are:
- Text to Mask Advanced (Clipseg) - Enter up to four prompts, and choose a mask that combines them with logical "and", logical "or", or outputs all four masks as separate channels of an RGBA image.
- Clipseg Mask Hierarchy - Select objects from foreground-to-background to create a segmentation map of separate distinct areas and use each region mask separately.
You can download the files here: https://github.com/dwringer/composition-nodes/
Note:
Currently, the version of the Transformers library [4.46.3] that's pinned for the InvokeAI package has a regression which results in the Clipseg nodes failing to work properly, giving the error ("ValueError: Input image size (352x352) doesn't match model (224x224)."). This is fixed at least as early as Transformers 4.48.3, which can be installed by activating your InvokeAI .venv and typing uv pip install transformers==4.48.3. This will fix the Clipseg nodes, although it's always possible there might be unintended consequences from upgrading.
Image and Mask Composition node bundle for InvokeAI - dwringer/composition-nodes