Infinity Festival 2019: Technology in Entertainment
Article by Angie Kibiloski
On November 8th, I attended the Infinity Festival, for its 2nd year in Hollywood, CA. This 2-day event, located at Goya Studios and the Dream Hotel, hosted a series of presentations, talks, and panel discussions, all revolving around the ways technology is used to create, distribute, and enjoy entertainment. The speakers were all influential professionals from Hollywood or Silicon Valley, coming together to engage attendees on how these 2 industries have become so tightly interwoven. There were multiple conference tracks, each with a different theme, like Immersive Media, Movie Making, Blockchain, and AI. I found the discussions on AI to be the most interesting, so those are what I’ll be highlighting here. If you’re interested in other topics, or if you’d like to know more about the Infinity Festival or the speakers, you can go to http://infinityfestival.com.
All of the AI presentations were held in the Rorschach Room of the Dream Hotel, which is a comfortable little space with the most bizarre wallpaper, patterned with repeating sequences of Rorschach ink blots. It was odd, though kind of fancy, and gave me something to hold my attention in between speakers. Now, the term AI is a bit misleading, since the depictions of Artificial Intelligence that we see in the movies, and the realities of where the technology is in its practical development, are very different things. The terms that various presenters used more often are “machine learning,” “deep learning,” and “supervised learning,” because the machines still need a human to initiate the programming, and the computer then responds to situations based on a set of parameters. They can then learn to recognize and adapt to these data sets, making logical decisions and inferences. However, the machines can’t really think for themselves yet, and we are many years away from that breakthrough, so we won’t be getting an autonomous robot revolution any time soon. Sorry to disappoint, but though it may seem like it at times, your Alexa is not connected to Skynet.
The first discussion I attended was AI Driven Humans and Content Digitization, featuring Hao Li, who is the CEO of Pinscreen, an ILM alum, motion-capture and facial-tracking superstar, and seemingly all-around cool dude. His industry creds are impressive, and I encourage you to find his full bio on the Infinity Festival Schedule page, by clicking on the title of his discussion. Their site is not organized to allow me to link you directly to individual speaker profiles, but they aren’t too difficult to locate. The topic at hand was our growing need for realistic representations of ourselves in the digital world, in the form of photorealistic 3D avatars. The ability to create detailed, fully rendered human bodies and faces has been available for years in the CGI gaming world, but the process is time consuming and expensive. With emerging technologies, and machine learning, we are now starting to be able to render believable digital humans in real-time. Hao Li spent a very interesting half hour educating us on the software that makes this possible, and the potential applications in both entertainment and e-commerce.
For instance, being able to quickly create a full body 3D avatar of yourself, with just a few photos, right from your phone or computer, and then being able to “try on” different outfits virtually, would make clothes shopping super easy. If you could see how clothing would actually fit your unique body, and how a particular garment’s fabric moves as you move, you could more confidently make purchases from home. It’s currently difficult to consistently render clothing in real-time, but the technology is progressing at a solid pace. An issue is getting the AI to reliably fill in missing information from the source data, like if a part of the body in the uploaded photo is hidden, the geometry that the virtual clothing has to move around will need to be inferred from the AI’s past data experiences, and human bodies are not as similar to each other as human faces. It’s easier to simulate faces, because there is a closer match from one to another.
In fact, we can currently do a lot with faces, both in creating accurate virtual renders of our own using multiple source photos, as well as translating facial expressions and movements of our own faces to a rendering of another, through facial-recognition and motion-capture. The AI will track recognizable geometry of facial features, and use your source movements to puppet the destination avatar. In this way, I can be looking at a camera and making funny expressions, but in the video output, it could be Queen Elizabeth making those expressions. While this can be an entertaining prospect, it also opens up potential risks when this technology is widely available, like the production of “deep fake” videos, and the dangerous potential for being able to almost instantly impersonate another person’s likeness. Going forward, the industry will have to find ways to protect individuals from having their likenesses used against their wills. One safeguarding method might be through the use of hidden “watermarks” encoded in every image we make public, from magazine photos to social media posts, to both deter mischief makers from stealing images, and to detect and flag “deep fake” content that gets released.
Up next was Conversational AI: A New Creative Output, with Dustin Dye, Audrey Wu, and Bant Breen. Dustin Dye is the CEO of Botcopy, a company specializing in creating character and dialogue driven chatbots for companies around the world. Audrey Wu is the VP of Strategic Partnerships at Haptik, one of the largest conversational AI companies. Bant Breen is the founder of Qnary, a company that focuses on professional reputation management and talent branding solutions. Together, they held a lively panel discussion on the pros and cons of interacting with conversational AI as opposed to humans, the limitations of where that technology is today, and its future development potential. They spoke about how chatbots can be used in a variety of industries, to make the consumer’s interactions with a company smoother, more efficient, and consistent from user to user. In entertainment, chatbots are being used for promotional purposes, like the Chucky-bot that interacted with people in the lead up to the most recent Child’s Play movie, getting more sinister in his conversations the closer it got to the film’s release. In e-commerce, chatbots are used on various companies’ customer service sites, for customers who have questions that might not need an actual human to troubleshoot. These sites usually have real people for back-up, in case the issue is more complex than a bot can handle, but for the most part, the chats are bot driven.
The key to making a successful chatbot is the AI’s ability to recognize linguistic text data or verbal cues, depending on whether the chat is text or voice based, to process the customer’s questions and intention, and then believably simulate human conversation in return. A stumbling block is that an AI brain works off logical parameters, but human interactions are based on emotional responses as well. So, a chatbot needs to be able to extrapolate both types of data to fully engage with a human user. This is particularly important for a business who wants to maintain a good relationship with their customers, and wouldn’t want someone to be put off by their company through an ineffectual, or even seemingly rude chatbot. Take the recent Microsoft social media bot as an example, who became infamous for her speedy descent into racism, all from poorly interpreting textual cues from the users who interacted with her online. Though, considering how social media interactions can devolve, she may have learned all too human behaviors. Chatbot learning will continue to improve and adapt, and our interactions with this type of AI will become more seamless and pervasive, in advertisements, call centers, and even food service, like at certain drive-thrus. Is this a good thing? Will most of our future interpersonal interactions be with AI instead of humans, and will this be better or worse for society?
After a relaxing lunch break, where Press and speakers were treated to a selection of delicious food boxes, in a shaded, turfed patio area, it was back at it in the Rorschach Room. My 3rd presentation of the day was The Case for Digital Humans, a series of 3 intimate discussion groups, all moderated by Jacki Morie, who has spent her career creating VR experiences, including a recent one to help NASA prepare future astronauts for long space missions. The 1st of the 3 groups contained Remington Scott and John Canning. Remington Scott is a founding partner of MacInnes Scott, a company that creates immersive digital worlds, and realistic avatars. A couple of his career claims to fame are bringing Golem to life with motion capture for Lord of the Rings, and using performance capture to make Call of Duty: Advanced Warfare a more fully realistic gaming experience. John Canning is the Executive Producer at VFX company Digital Domain. During their 20 minutes, they talked about how AI can help humans make an emotional connection with a virtual being, if the AI behind it is able to convey a relatable enough persona. In the future, this could mean providing emotional companionship through an avatar, but in the present, it’s most evident in entertainment. While watching a movie, for instance, where the digital human, or fantastic creature, has been crafted in a way to simulate recognizable emotions, we can connect more fully to them, and be more invested in the story. This has always been the purview of the post-production animators, who spend months in front of a screen, digitally manipulating graphics. Now, another way this could be achieved is by channeling the expressions and mannerism of a real actor in a motion-capture, or facial-capture rig, and letting the animators make final small tweaks, instead of having to start from scratch. This both makes the animation process more efficient, and allows for a more true-to-life representation of emotion, especially if the end goal is to simulate the unique expressions of a particular actor.
The 2nd small session belonged to Mike Sanders and Ari Shapiro. Mike Sanders is the Senior Visual Director at Activision‘s Central Studios, and is one of the industry leaders in bringing realistic graphics to movies and games. He has had a hand in creating the digital effects and CGI characters in the Star Wars prequels and sequels, Pirates of the Caribbean, and Call of Duty, to name a few. Ari Shapiro is the CEO and Founder of Embody Digital, and formerly spent decades creating visual effects and digital humans for industry powerhouses like ILM and LucasArts. During their allotted time, they kind of jumped all over the place, to be honest. They started by mentioning that as we have broader access to digital avatars in the future, who look and act more and more human, the fact that we can hide behind these realistic avatars could dehumanize more and more of our interactions with actual people. One day we all may have digital versions of ourselves out there, but the key to making a believable avatar would entail capturing the movements and mannerisms that are unique to each individual, instead of just putting a skin over a stock body. They spoke about the need to form an emotional connection with a digital human to make it successful and engaging, and not just an obvious bot, and mentioned that we had this at a very basic level, way back in the 90’s, with the interactive AI of the Tamagotchi digital pets. That creature directly responded to your actions, and you cared if it thrived or died. If we can make an emotional connection with a few pixels, imagine what the future could hold with the right AI.
Rounding out the 3 groups was Paul Debevec and Alexx Henry. Paul Debevec is a Senior Staff Engineer at Google VR, and has spent his career earning a shelf full of awards for his innovation in graphics, seemingly from every conceivable aspect. I encourage you to read his full bio on the Infinity Festival Schedule page. Alexx Henry is the CEO of BlueVishnu, a full-service 3D scanning and avatar creation company. Together, these two touched on how far we are from AI being so seamless that you forget you’re talking to a bot, but how it’s already believable enough for some really interesting applications. I was particularly intrigued by the USC ICT project they mentioned, where video was gathered from Holocaust survivors, who were interviewed for 5 hours a day, for a full week, surrounded by enough cameras to create a holographic projection of each individual, to be displayed in an educational USC exhibit. The end goal was to create interactive versions of these real people, so that visitors to the exhibit could ask them pertinent questions and learn about their real experiences. In this way, the stories of these survivors will live on well after they, themselves, have gone. Alexx Henry shared that he did something similar with his wife, before she succumbed to cancer. Our AI technology is not to the point that her digital avatar would be a true replica of her personality, but it provided a way for their family and friends to be able to speak with her into the future. These types of AI uses are the bright spots in the potential field of applications, bettering our individual and collective lives, by enhancing human connections through technology, and I hope that projects like this will become as prevalent as that of e-commerce chatbots and sci-fi movie characters.
Finally, I ended my conference day with AI Playground: Enhancing Human Creativity, with Rick Champagne from NVIDIA. A good deal of his presentation was going over some of the material I’d heard in other discussions, like the capabilities of current AI, various applications in the content creation field, and how an AI neural-network uses large data sets to learn the predictive tasks it’s going to be asked to carry out. This would have been a great session to start the day with, instead of end it with. He did talk about a few graphical uses for digital artists that weren’t mentioned before. Using predictive AI software, an artist can de-noise and up-res a photo, by using pixel data from other parts of the image to infer what should be added or replaced. This could be used to eliminate noise from lighting variations, debris on the source photo, even text, and to clarify an enlarged image. Other AI can merge 2 images to generate a new, related one. For example, take a photo of your house as the target subject, and a photo of Van Gogh‘s Starry Night as a style to apply to it, and you might get a new image of your house depicted in the style of that iconic painting. We were treated to a slideshow of some pretty impressive examples of this, and I wish I could share them with you. Just Google “Photo Style Transfer” and you can read several articles about the algorithms behind this type of AI, which are well beyond my ability to explain.
I thoroughly enjoyed my session-packed day at Infinity Festival, learning so much about the current state and future of AI, and I will eagerly look forward to next year’s event! All of the speakers are superstars of the industry, inhabiting and thriving in the space where technology and entertainment coincide. I plan to delve deeper into the various speakers’ companies, continuing to learn more about this field, and I encourage you to do the same. As I said at the beginning, AI was only one of the conference tracks at the festival, so if you’re interested in finding out about the others, go visit the Infinity Festival site to read all about them.