In March 2023, a federal judge ruled against the Internet Archive in Hachette v. Internet Archive, finding that the Archive’s “Controlled Digital Lending” program violated copyright law. The decision was upheld on appeal in September 2024, and the Supreme Court declined to hear the case in late 2025.

You should care because this case is about to collide headfirst with AI—and libraries are caught in the middle.

What Actually Happened

The Internet Archive ran Controlled Digital Lending (CDL). They’d scan physical books they owned, lend digital copies on a one-to-one basis, and call it “digital lending” under fair use.

Publishers sued. They argued CDL was piracy with extra steps.

The court agreed. Judge John G. Koeltl ruled that CDL wasn’t fair use because:

  1. The Internet Archive was making copies of entire works
  2. The digital copies directly substituted for licensed e-book sales
  3. The market harm was significant
  4. The nonprofit nature didn’t save them

The Second Circuit upheld this in September 2024. CDL is dead. The Internet Archive owes damages. Every library watching this case as a potential legal foundation for digital lending just lost.

Why This Matters for AI (And Your Library)

AI companies are training their models on copyrighted content. Books, articles, images, code—massive datasets scraped from the internet and used without permission or payment.

When publishers and authors sue (and they are), AI companies are using the same fair use arguments the Internet Archive tried:

  • “We’re transforming the content, not redistributing it”
  • “This serves the public interest”
  • “The original works aren’t being substituted”
  • “This is how AI learns, just like humans learn”

The Internet Archive said similar things. The court didn’t buy it. There’s a good chance courts won’t buy the AI companies’ version either.

Which brings us to libraries.

The Vendor Problem You’re Not Thinking About

Your discovery systems, catalog tools, research databases, and recommendation engines use AI trained on copyrighted content.

If courts decide that’s copyright infringement (not fair use), what happens?

Option 1: The vendor gets sued, loses, and passes costs to you via higher subscription fees.

Option 2: The vendor loses access to training data and has to rebuild the AI with licensed content only—tool gets worse or more expensive.

Option 3: The vendor decides AI features aren’t worth the legal risk and removes them entirely.

None of these are great.

The HathiTrust Exception (And Why It’s Narrow)

In Authors Guild v. HathiTrust (2014), libraries successfully argued that creating a searchable database of scanned books was fair use. The court said full-text search and accessibility features were transformative uses that didn’t harm the market.

But here’s the catch: HathiTrust was narrow.

The court said full-text search was okay. Not full-text display. Not full-text lending. Just search—where users could find keywords and see snippets, but not read the whole book.

If your AI tool is summarizing full articles, generating answers based on copyrighted content, or recommending resources by analyzing entire texts… that’s not just search. That’s reproducing and transforming content in ways HathiTrust didn’t cover.

After the Internet Archive case, courts are clearly skeptical of “but we’re a library” as a fair use defense.

Multiple lawsuits are testing whether AI training is fair use:

  • Authors Guild, Sarah Silverman, and others v. OpenAI (filed 2023): Authors claimed OpenAI trained ChatGPT on pirated copies of their books. In late 2025, courts issued mixed rulings—some claims survived, others were dismissed. Case heading toward trial in 2026.
  • Getty Images v. Stability AI (filed 2023): Getty claimed Stability AI scraped millions of copyrighted images. Settlement negotiations ongoing.
  • New York Times v. OpenAI and Microsoft (filed 2023): The Times claimed ChatGPT reproduces Times articles verbatim. This case is moving forward and will likely be the landmark decision.

The pattern is clear: Courts are skeptical of “AI training is fair use.” They’re asking hard questions about whether AI output substitutes for original works, whether AI companies are profiting off unlicensed content, and whether “transformation” is enough when the original work is reproduced in output.

As of early 2026, no court has definitively ruled that AI training is fair use. Every library using AI tools built on copyrighted content is operating in legal uncertainty.

The Questions You Need to Ask Your Vendors Right Now

Next vendor meeting, ask:

  1. “What data did you use to train this AI?”

    • If they say “publicly available data” or “internet-scale datasets,” that’s a red flag. “Publicly available” doesn’t mean “licensed for AI training.”
    • If they can’t or won’t tell you, bigger red flag.
  2. “Do you have licenses for the training data, or are you relying on fair use?”

    • If it’s fair use, they’re gambling. And you’re gambling with them.
    • If they’re licensing content, ask to see proof.
  3. “If a court rules that AI training isn’t fair use, what happens to this tool?”

    • Do they have a plan? Will they rebuild with licensed data? Will the tool disappear?
    • Will you get a refund if the tool becomes unusable due to legal issues?
  4. “What happens if you get sued over AI training data?”

    • Are you (the customer) indemnified? Or are you on your own?
    • What’s their legal defense strategy?
  5. “Does your AI reproduce copyrighted content in its output?”

    • If it’s generating full article summaries, reproducing text verbatim, or creating derivative works… that’s a problem.

If your vendor can’t answer these questions clearly, decide: Is the AI feature worth the legal uncertainty?

What You Can Control (And What You Can’t)

Things you can’t control:

  • Whether courts decide AI training is fair use
  • Whether your vendors get sued
  • Whether copyright law changes to accommodate AI

Things you can control:

  • Which vendors you choose (pick ones that are transparent about training data)
  • What contract terms you negotiate (indemnification clauses, exit clauses)
  • How you document your decision-making

The Scenario You’re Not Prepared For

It’s mid-2026. A court rules that AI training on copyrighted content without explicit licenses is infringement. Your discovery system vendor used unlicensed training data for their AI recommendation engine.

Vendor gets sued. They lose. They’re ordered to stop using the AI and pay damages.

Suddenly, the AI features you’ve been relying on disappear overnight. Your patrons are confused. Your staff doesn’t know how to explain it. Your administration is asking why you didn’t see this coming.

Then the vendor sends you a bill—a “compliance surcharge” to cover their legal costs and rebuild the AI with licensed data. It’s 40 percent more than your current subscription. Take it or leave it.

What do you do?

If you don’t have an answer, you’re not ready.

What You Should Do This Week

  1. Make a list of every AI tool you use or are considering
  2. Email your vendors and ask the five questions above
  3. Document their responses (or lack of response)
  4. Decide your risk tolerance: Are you okay using AI tools with unclear legal foundations?
  5. Create a backup plan: If your AI tools disappear tomorrow, what’s your fallback?

This isn’t hypothetical. The Internet Archive case is done. The AI lawsuits are active. Courts are deciding this stuff right now.

Don’t wait until it’s too late.

Email your vendors. Ask the five questions. Document their answers. Protect yourself.


Authenticity note: With the exception of images, this post was not created with the aid of any LLM product for prose or description. It is original writing by a human librarian with opinions.