This article is available in: French
Update: a few details have been added to the article, such as an official answer from Sketchfab.
Researchers have unveiled Objaverse, a “massive open dataset of text-paired 3D objects”. It contains about 800 000 3D models alongside text descriptions.
This dataset can be downloaded and it was created thanks to 3D models shared on Sketchfab (an online platform owned by Epic Games). The team only used 3D models shared under Creative Commons licence. In other words, if you shared 3D models on Sketchfab using a CC licence, they are probably included in Objaverse. This is probably also the case, even if you used the NoAI tag, which is supposed to prevent any use by AI.
Why create such a dataset?
In their paper Objaverse: A Universe of Annotated 3D Objects, Matt Deitke et al. explain why they created this dataset. They highlight that massive datasets are already available when it comes to text or pictures, and they are the reason why AI progressed so dramatically in the recent years/months. In other words, tools such as ChatGPT, StableDiffusion wouldn’t be able to create text or pictures without datasets to train them, wether these datasets are open or not, available for commercial use or not.
Until now, only mid-sized datasets of 3D data were available, and they only have limited diversity of object categories. Which, of course, limited their use.
With large-scale datasets, new AI-powered tools could be created. For example, you could train an AI to create 3D models from a text description, or to create LODs/retopologize an asset, to identify what a 3D object is supposed to be, or event to create animations for a 3D character. Such a dataset could also be used in the computer vision area, and not only as training data but also as a benchmark.
Objaverse: objects sourced from Sketchfab
At this stage, you probably realize that a dataset such as Objaverse has a large potential when it comes to AI. It can be used both as training data or as a benchmark. To create Objaverse, the researchers explain that they sourced the 3D models and descriptions, tags from Sketchfab. Objaverse contains over 800K assets designed by over 100K artists. 3D scans, 3D models created from scratch and even animated assets are included.
It should be highlighted that this dataset only sourced assets that were shared using a Creative Commons licence (most of them are under CC-By licence).
What can Objaverse be used for?
Objaverse has just been announced, but it has already been used by several research projects. Text2Tex, for example, is a text-to-texture tool trained using Objaverse.
Matt Deitke, the main author of the paper that introduced Objaverse, gives other examples such as Zero-1-to-3, a system that can create 3D models from a single image.
Questions raised by Objaverse, first reactions
Scrapping Creative Commons assets is allowed by the licence itself, but this pratice raises a few questions.
Many artists and creators have been uploading 3D models on Sketchfab for a long time, and some of their assets were therefore shared before the rise of AIs. Furthermore, Objaverse doesn’t seem to take into account the “NoAI” tag that can now be used on Sketchfab to openly state that you don’t want your assets to be used to train AIs. Of course, in this situation, the team behind Objaverse wouldn’t be the ones doing the licence infringement, and this kind of misuse is already possible with assets shared on Sketchfab.
We should also highlight that many 3D assets shared using a Creative Commons licence… Aren’t actually under CC licence. For example, a quick search and you’ll find an asset ripped from a Nintendo game and only slightly tweaked by the user who uploaded it. The asset is way too close to the original copyrighted asset to be shared under CC license.
When they learned about Objaverse, some artists chose to delete their Sketchfab accounts, while others suggested, probably as a joke, that one way to deal with this matter would be to upload assets “with non-manifold geometry to Sketchfab and label them with common tags”, in order to create bad data. In other words, a dataset scrapped from Sketchfab would then be unusable to train an AI. Of course, this would probably be considered as spam by Sketchfab and by other users of the platform.
How can I check wether my 3D models are included in this dataset?
The creators of Objaverse have set up an exploration tool, available here. Looking for your Sketchfab handle or typing the name of one of your 3D models should help you check wether it is included or not.
What is Sketchfab’s take on this?
Alban Denoyel, Sketchfab CEO and co-founder Sketchfab (as a reminder, Sketchfab is owned by Epic Games and will soon be merged into Fab), reacted on Twitter.
His answer highlights four main points:
- He highlights that “those models were mass aggregated by objaverse without [their] knowledge”, and that they “have absolutely zero upside in something like this happening”.
- He also explains that this dataset was created before Sketchfab implemented the NoAI tag, which may explain why it is not taken into account.
- He also underscores that the dataset relies on “CC content set downloadable by users”. In other words, even if they didn’t expect it, they did technically allow this kind of use of their assets.
- Last, but not least, he explains tat Sketchfab/Epic Games “are looking into what resort [they] have”.
The official Sketchfab account also published a few tweets on this topic, explaining that they “understand artists’ concerns and are looking into it”.
We asked Sketchfab if they could help us shed some light on this issue, and we will update the article accordingly. We also asked the creators of Objaverse what are their plans (in particular, are they going to exclude 3D models that now have a NoAI tag on Sketchfab, and how they are going to handle 3D models shared under CC licence on Sketchfab but that clearly are copyrighted).
This situation highlights the fast that “NoAI” tags used by some digital art platforms are not a perfect solution to handle the rise of AIs, since when they are implemented, data may have already been scrapped. Objaverse also reminds us that uploading an asset under CC licence can lead to situations unforeseen by the artist.
Last, but not least, this announcement highlights the fact that claiming a tool is trained on non-copyrighted data isn’t enough if the data hasn’t been thoroughly checked. As is, Objaverse does include copyrighted material as well as creations from artists who did share their work under CC licence, but who didn’t want their work used to train AIs. This raises ethical and legal issues. Hopefully the team behind Objaverse will take these concerns into account.
In the meantime, if you have a Sketchfab account, you can add a “NoAI” tag on all your uploads using the Settings/Account page if you wish to do so. This will assign “NoAI” meta tags to all of your past and future uploads and disallow their use by generative AI programs. Of course, this won’t have any impact on the data that might already have been downloaded.
We will keep you informed about upcoming developments regarding Objaverse and AIs. Don’t forget to follow us on Youtube, Instagram, Facebook, Twitter, LinkedIn if you don’t want to miss our latest content.