AI Product DevelopmentopiniondesignAILLM
The past year's advances in image synthesis (DALL-E, MidJourney) and large language models (LLMs, ChatGPT) have been truly exciting and have given many the occasion to look towards the future in a new way. Not to be left out, many designers and engineers have asked how these advances will change the way we design, develop, and manufacture products, asking whether they will help us, pass us by, or obsolete us. At Mechanomy, we're in the former camp and in this post I'd like to elaborate why.
Before we get too far into it, I'd recommend this overview of the present moment, where Tim Lee gives a good overview of the recent advances in large language models (LLMs) behind ChatGPT.
First, it is clear that some of today's tasks are becoming vastly cheaper and will force career changes:
As a motivating example, let’s look at the simple task of creating an image. Currently, the image qualities produced by these models are on par with those produced by human artists and graphic designers, and we're approaching photorealism. As of this writing, the compute cost to create an image using a large image model is roughly $.001 and it takes around 1 second. Doing a similar task with a designer or a photographer would cost hundreds of dollars (minimum) and many hours or days (accounting for work time, as well as schedules). Even if, for simplicity’s sake, we underestimate the cost to be $100 and the time to be 1 hour, generative AI is 100,000 times cheaper and 3,600 times faster than the human alternative. a16z
I say that 'tasks' are becoming cheaper instead of 'jobs' because there is an important difference between havers-of-ideas and doers-of-things. A graphic artist's job is to communicate ideas visually; they do this through a number of tasks that create the graphic and it is a certain few of these tasks that are actually, if significantly, advanced by the AI generators. For instance, (if) in creating a graphic there are tasks: receiving the idea, elaborating emphases, designing the scene and variants, choosing colors, primary painting, layering, and finishing, only some of these are becoming faster while others are being sidestepped, rearranged, or skipped over. Primary painting appears to be well-performed by the image generators though having very little ability to specify small details like the brand on a soda can. The scene is also coherently composed by the generator, but only implicitly because scene design is not separated from painting; there is again very little ability to change specific aspects of the result, like shifting the position of a sofa. AI generation differs significantly from today's workflows which makes it necessary to understand why certain tasks are performed and whether they are essential and able to be performed under an AI workflow.
Prior to the image generators, the market for new graphic art was limited to those with the resources and need to commission new creations. Image generators have greatly lowered the crude cost of new art, but that lowering comes with costs to quality and customization that may or may not be acceptable.
I argue that as significant as these AI advances are for image generation and language tasks, product development is an altogether different domain that requires much more human consideration.
Designing a new product often feels like a race, not necessarily against formal competitors but against the speed of my own thought and experience. Once you start exploring an idea, the pace of your thought needs to be matched by the pace of your design work. There is a window of time when you most understand what is novel and are able to translate your thoughts into sketches, conversations, words, and other external representations. These actions fix certain aspects of the idea, freeing the mind to work on subsequent aspects. The goal of most design tools is to enable you to do this quickly, to help you move from vague to grounded concept as quickly and thoroughly as possible. If the design is driven too quickly, either by team members, opinionated software, or aggressive deadlines, it becomes hard to give each element sufficient attention. On the other hand, delay these expressions too long and the mind gets bored and frustrated, forced to break conceptual leaps into trivial steps.
If you accept this idealized drama of design, the question for any design tool is how it affects this process and the many tasks within it.
Where can LLMs, image generators, and other AI tools help design engineers?
The most immediate place is in explaining an idea to someone else. (Exaggerating for effect,) I find it laborious and sometimes challenging to codify all of the context about why I am excited for one idea and bored by another. One reason that it can be challenging to explain exciting ideas is that language requires both creativity and a degree of specificity. During the idea stage I devote much of my attention to imagination and observation while avoiding other, unrelated thought modes like diction. When surrounded by unknowns the last thing I want is to open the window to another swarm of unknowns, those of language.
Related to this is the elusive aspect of specificity. Words have particular meanings, and being forced to express a variably-known idea in words desires wording that respects what is known and unknown about the idea. Use too strong of a word, or one that your colleague has a more strict conception of than you, and suddenly a perhaps important aspect of the design has been specified, absent any direct consideration or process.
Expressing ideas with the help of a LLM can remove some of these negatives, freeing the designer from composing and iterating on an idea description, while also offering a conversation with a non-judgmental non-person, one who cannot be trapped in a flawed conception of the idea and is at any rate not going to participate in any latter phase of the development.
Complementing LLMs is the ability of image generators to synthesize various combinations of concepts. While this is not relevant to every field, it has clear utility in communicating user stories by generating images of people in scenarios to guide development. This can be thought of as the synthesizing of an idea board, the ability to codify and communicate an aesthetic that can pull designers into a fruitful creative space. As this task is only coarsely defined, up until this point it has not made sense to task graphic artists with this creation, relying instead on the designer's sketching. This context communication is well-suited to image generators, as a tool to assist creation and an improvement on prior practices which hopefully results in more and better creation. (In some ways the many errors in generated scenes can reinforce the idea's ambiguity.)
Similarly, image generators can be used outside the development team to also develop customers and elicit their early feedback. Why press the dev team for an early render when user research can generate 10 coarse variants to ground differing use-cases? At least right now, a picture is worth a thousand words; this may change in time, but any improvement to idea communication will help user researchers and marketers.
These are only two aspects of product design - communicating ideas and idea boarding - that can be improved by recent advances, though there will be many more. (I'd especially like to hear from design firms how they're using AI today, leave a reply or email!)
So can AI design products?
In presenting first how AI can help designers and engineers in product development, I mean to illustrate how the tools we have right now can help us do our work more effectively. As with the graphic design example, no designer would describe themselves as an idea board builder; rather idea boarding is an task designers do in order to understand their users and ideas. Likewise no product developers are chiefly focused on elaborating ideas in text, as the ability to have or hold the idea is much more widely useful than its mere recitation in text.
What should be clear is that AI is not a totalizing, direct-to-user technology competitive with today's (or tomorrow's) product development processes. As ever, it is a new tool amenable to some tasks that makes developers more capable and productive. With that said, here are three major impediments to AI product development of any sort.
1: No database
First, there is no large database of 3D models with feature descriptions, making it hard to learn relationships between 3D features and their descriptions. As you may know, the advances in image generation are entirely the result of having large image databases of a wide variety of regular scenes with generally accurate and specific labels that describe the scene. In contrast to photo services like Getty, which hosts many professional photos with strong and unambiguous labels, sites like Thingiverse have many fewer 3D models and presently lack descriptions of form and function. Indeed, today's 3D platforms have an altogether different business model than the stock art services, focused on assets for 3D rendering and printing, and both of these are much smaller uses than the every-article-must-have-a-title-photo practice of digital publishing that tied stock art to the rise of online news and commentary. Lacking this database, you cannot train an AI to link feature descriptions to the 3D data that provides that feature.
2: No language
Second, if you had a database of 3D models and incorporated features, in the direction of services like TraceParts and distributors like McMaster-Carr, the relationship between product specification and geometric feature, as well as their function, are unknown. "What about this wheel makes it appropriate for a TV cart? Well, certainly its shape, which allows it to roll over the ground while bearing a load, but what do we mean by roll and bear? Well, to roll is to revolve about an axis which can be described mathematically... And to bear a load means here that some load elsewhere is prevented from falling while some other function occurs..."
Of course we can describe a cart with equations and we can apply various tests to the equations to ensure their correctness, but choosing these tests and determining whether they sufficiently and uniquely describe the cart is quite a bit of work. This work has not been done, not because it is impossible but because it has not been needed (yet?). Now, I argue that the advance of machine tools is making these technologies necessary, but we are still at the stage of building a language that describes existing systems and I at least do not see many shortcuts.
(Let's say I was rather surprised to see this promotional animation from a company called ValiSpace. It's a nice vision and some aspects are technically possible today, but I've seen no proof that any of the groundwork has been laid to enable this, nor that one startup is capable of doing so, as they do not need AI to be able to begin utilizing the underlying capabilities.)
Lest I be too dismissive, I do think that it will soon be possible to extract the movement of mechanisms from videos of their action, and having those movements to search for some combination of first-principles models that might give rise to that motion and thereby 'discover' the recorded mechanism. Associating these extracted movements with the surrounding context of the video may enable the form and function database to be initially populated. There are obvious starting points in the structure from motion algorithm in photogrammetry and the techniques of mechanism synthesis, but I haven't heard of anyone working on this.
Again, I'll never argue that computation will always be too expensive, that brute-forcing all of this analysis will remain beyond us. But overfitting is a very real danger, and at some point it does not matter how the mechanism appears to move or that a derived simulation passes its checks, at some point you need to build the mechanism and determine whether it is behaving like the model, and even with lights-out machining this will still take time to produce and will still not be able to determine if a variant is useful.
3: No judgement
Third, we do not suffer from having too few ideas or options in designing a product. Rather, the real challenge is determining what is worth prototyping, developing, manufacturing, and bringing to market. Wherever there are unknowns, there is risk, risk is that is expensive and consuming of calendars. Design and development thus always entails judgment, judgments rooted in understandings of physical reality, user behavior, competitors, etc. These judgments only make sense from a human perspective; if you cannot imagine someone using your product, then you are not going to develop something that they will use. The converse also applies, that an AI's inability to imagine a product's use necessarily prevents it from making correct judgments on which products to develop and their features.
In the same way, a notional AI-developed product cannot be inventive. We can certainly use AI to invent something new, but the whole point of being new is that that new thing is not in the existing corpus of things that are generally known. Can I imagine asking AI to swap one feature for another or to combine several separate products into one, sure. And in so far as these alterations and combinations were included in a brute-force multiplexing-of-things because compute is cheap, well that's still not really novel.
All of that is moot until one user says: "I want that". That expression is quintessentially human, something that a computer cannot do because computers have no interests or desires and lack agency in any sense.
At the end of the day, engineers build and wield tools; new tools can only increase what we are able to do. The question of generative AI is actually the question of what we desire to invest ourselves in, how we can develop our abilities to help our neighbors; AI can never do this for you.
— Ben Conrad