Close Menu
The LinkxThe Linkx
  • Home
  • Technology
    • Gadgets
    • IoT
    • Mobile
    • Nanotechnology
    • Green Technology
  • Trending
  • Advertising
  • Social Media
    • Branding
    • Email Marketing
    • Video Marketing
  • Shop

Subscribe to Updates

Get the latest tech news from thelinkx.com about tech, gadgets and trendings.

Please enable JavaScript in your browser to complete this form.
Loading
What's Hot

Pinterest Outlines How to Optimize Your Pin Marketing Approach

May 14, 2025

Paris Hilton Elevates Skincare with Parívie

May 14, 2025

Domain Spoofing Is a Crisis of Trust

May 14, 2025
Facebook X (Twitter) Instagram
Facebook X (Twitter) Instagram Pinterest Vimeo
The LinkxThe Linkx
  • Home
  • Technology
    • Gadgets
    • IoT
    • Mobile
    • Nanotechnology
    • Green Technology
  • Trending
  • Advertising
  • Social Media
    • Branding
    • Email Marketing
    • Video Marketing
  • Shop
The LinkxThe Linkx
Home»Technology»Harvard Is Releasing a Massive Free AI Training Dataset Funded by Open…
Technology

Harvard Is Releasing a Massive Free AI Training Dataset Funded by Open…

Editor-In-ChiefBy Editor-In-ChiefDecember 12, 2024No Comments3 Mins Read
Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
Harvard Is Releasing a Massive Free AI Training Dataset Funded by Open…
Share
Facebook Twitter LinkedIn Pinterest Email


In addition to the trove of books, the Institutional Data Initiative is also working with the Boston Public Library to scan millions of articles from different newspapers now in the public domain, and it says it’s open to forming similar collaborations down the line. The exact way the books dataset will be released is not settled. The Institutional Data Initiative has asked Google to work together on public distribution, and the company has pledged its support.

However IDI’s dataset is released, it will be joining a host of similar projects, startups, and initiatives that promise to give companies access to substantial and high-quality AI training materials without the risk of running into copyright issues. Firms like Calliope Networks and ProRata have emerged to issue licenses and design compensation schemes designed to get creators and rightholders paid for providing AI training data.

There are also other new public-domain projects. Last spring, the French AI startup Pleias rolled out its own public-domain dataset, Common Corpus, which contains an estimated 3 to 4 million books and periodical collections, according to project coordinator Pierre-Carl Langlais. Backed by the French Ministry of Culture, the Common Corpus has been downloaded over 60,000 times this month alone on the open source AI platform Hugging Face. Last week, Pleias announced that it is releasing its first set of large language models trained on this dataset, which Langlais told WIRED constitute the first models “ever trained exclusively on open data and compliant with the [EU] AI Act.”

Efforts are underway to create similar mage datasets as well. AI startup Spawning released its own this summer called Source.Plus, which contains public-domain images from Wikimedia Commons as well as a variety of museums and archives. Several significant cultural institutions have long made their own archives accessible to the public as standalone projects, like the Metropolitan Museum of Art.

Ed Newton-Rex, a former executive at Stability AI who now runs a nonprofit that certifies ethically-trained AI tools, says the rise of these datasets shows that there’s no need to steal copyrighted materials to build high-performing and quality AI models. OpenAI previously told lawmakers in the United Kingdom that it would be “impossible” to create products like ChatGPT without using copyrighted works. “Large public domain datasets like these further demolish the ‘necessity defense’ some AI companies use to justify scraping copyrighted work to train their models,” Newton-Rex says.

But he still has reservations about whether the IDI and projects like it will actually change the training status quo. “These datasets will only have a positive impact if they’re used, probably in conjunction with licensing other data, to replace scraped copyrighted work. If they’re just added to the mix, one part of a dataset that also includes the unlicensed life’s work of the world’s creators, they’ll overwhelmingly benefit AI companies,” he says.



Source link

Artificial Intelligence copyright Dataset Free Funded Harvard machine learning Massive Microsoft Open open source OpenAI Releasing Training
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
Previous ArticleZooming in on the Generative AI Value Chain
Next Article Novel cocktail therapy based on multifunctional supramolecular hydroge…
Editor-In-Chief
  • Website

Related Posts

Gadgets

How to Recycle Your Old Phones and Appliances for Free

May 13, 2025
Technology

What It Is and Why It Matters—Part 1 – O’Reilly

May 13, 2025
Trending

Mind control of an Apple Vision Pro is possible with brain implants

May 13, 2025
Add A Comment
Leave A Reply Cancel Reply

Top Posts

100+ TikTok Statistics Updated for December 2024

December 4, 202462 Views

10 Ads That Struck a Cultural Nerve in 2024

December 30, 202412 Views

The Top 10 Coffee Franchises in 2024

November 21, 202412 Views
Stay In Touch
  • Facebook
  • YouTube
  • TikTok
  • WhatsApp
  • Twitter
  • Instagram
Latest Reviews

Subscribe to Updates

Get the latest tech news from thelinkx.com about tech, gadgets and trendings.

Please enable JavaScript in your browser to complete this form.
Loading
About Us

Welcome to TheLinkX – your trusted source for everything tech and gadgets! We’re passionate about exploring the latest innovations, diving deep into emerging trends, and helping you find the best tech products to suit your needs. Our mission is simple: to make technology accessible, engaging, and inspiring for everyone, from tech enthusiasts to casual users.

Our Picks

Pinterest Outlines How to Optimize Your Pin Marketing Approach

May 14, 2025

Paris Hilton Elevates Skincare with Parívie

May 14, 2025

Domain Spoofing Is a Crisis of Trust

May 14, 2025

Subscribe to Updates

Get the latest tech news from thelinkx.com about tech, gadgets and trendings.

Please enable JavaScript in your browser to complete this form.
Loading
  • About Us
  • Contact Us
  • Disclaimer
  • Privacy Policy
  • Terms and Conditions
© 2025 Thelinkx.All Rights Reserved Designed by Prince Ayaan

Type above and press Enter to search. Press Esc to cancel.