-
Notifications
You must be signed in to change notification settings - Fork 68
Description
While clean and impressive the model appears to hallucinate a lot. Let me give an example. a three quarters view of a two door car with alloy wheels in high res.
The wheels in the sample image are five spoke snowflake design but the model is generated with six spokes and a different snowflake design. It hallucinates a badge on the hood where there is not one in the picture. It invents subtle curves and lines, and omits badges and sculpted curves visible. It hallucinates a contour on the windscreen evocative of a race car that is not in the picture. it outright hallucinates a "device" attached to the windshield glass not in the picture at all.
The door to the car omits a shut line to the rear. Anything not visible in the stimulus (underside, far side, rear diffuser) is of course just invented from what I assume is the models training input on "car type object". So it is like it takes the platonic idea of a "two door coupe" and then warps it to approximate the stimulus. This makes me wonder how well it can reproduce objects and geometry that it has not been trained on or encountered (not animals, masks, bipedal figures, common objects etc).
Will a further developed model allow input of multiple images from different angles would that resolve hallucinations?