The Unhinged Librarian
13 min read

Data Extraction Survival Guide: What Actually Gets Stuck

How to audit what data you can actually extract from your vendor. Red flags, escape clauses, and recovery strategies for data that gets stuck.

Why I wrote this: I watched a library discover during cutover that patron checkout history couldn't be exported, so they lost 8 years of behavior data.

Ask about data extraction now, during the contract you already have. Asking at the end of the contract is too late.

Most library vendors will tell you your data belongs to you. It does. But they're counting on you not asking hard questions about how to get it out.

TL;DR
  • Data lock-in is intentional: vendors benefit from patron data they own; libraries lose behavioral analytics, circulation history, and patron profiles when switching vendors.
  • Data extraction barriers: no export API, proprietary formats, licensing restrictions on extracted data, and contractual language claiming vendor ownership of aggregated analytics.
  • Audit required questions during contract negotiation: who owns extracted data, export timelines, format specifications, and what data is genuinely locked (and at what cost to unlock).
  • Recovery strategy if stuck: negotiate extraction in contract termination clause, plan migration timelines before crisis, and maintain parallel data stores in vendor-neutral formats.

I watched a library discover, three months before their migration was supposed to happen, that the patron checkout history they wanted to move to their new system couldn't be extracted. Not "would be hard to extract." Couldn't be extracted. The vendor's export tool didn't support it. The data was stored in a proprietary format. After 8 years of data collection, it was effectively trapped.

The library had to choose: start over with empty transaction histories, or delay their migration 6 months while the vendor tried to build a workaround that wouldn't be available anyway.

That's what this guide is about. Making sure your data doesn't get trapped. Asking the right questions now, before you need to move.

The Data Extraction Reality

Your vendor stores your data. You own the intellectual property (the books, the patron records, etc.). But the vendor controls the format and the export tools.

Most vendors have standard export formats for the things they designed for export: bibliographic records (MARC format), patron records (CSV or standard format). But the custom stuff you've accumulated? Historical data? System configuration? That's where things get sticky.

What Vendors Make Easy to Extract

What Vendors Make Hard to Extract

What Vendors Won't Extract

Questions to Ask Your Vendor RIGHT NOW

Don't wait until your contract is ending. Ask these questions now. If your vendor won't answer clearly, that's a red flag worth escalating.

The Data Export Checklist

Data Type Exportable? Format Cost? Timeline?
Bibliographic Records [ ] Yes [ ] No [ ] ? _____ [ ] Free [ ] Fee [ ] ? _____
Holdings Records [ ] Yes [ ] No [ ] ? _____ [ ] Free [ ] Fee [ ] ? _____
Authority Records [ ] Yes [ ] No [ ] ? _____ [ ] Free [ ] Fee [ ] ? _____
Patron Records (Current) [ ] Yes [ ] No [ ] ? _____ [ ] Free [ ] Fee [ ] ? _____
Checkout History [ ] Yes [ ] No [ ] ? _____ [ ] Free [ ] Fee [ ] ? _____
Patron Fines/Fees History [ ] Yes [ ] No [ ] ? _____ [ ] Free [ ] Fee [ ] ? _____
Holds Queue [ ] Yes [ ] No [ ] ? _____ [ ] Free [ ] Fee [ ] ? _____
Custom Fields (List: ____) [ ] Yes [ ] No [ ] ? _____ [ ] Free [ ] Fee [ ] ? _____
Configuration (Circulation Rules) [ ] Yes [ ] No [ ] ? _____ [ ] Free [ ] Fee [ ] ? _____
Configuration (Item Types) [ ] Yes [ ] No [ ] ? _____ [ ] Free [ ] Fee [ ] ? _____
Deleted/Historical Patron Records [ ] Yes [ ] No [ ] ? _____ [ ] Free [ ] Fee [ ] ? _____

Don't accept "yes" or "no" as final answers. Dig deeper on every row. If they say "exportable," ask what format. If they say "fee," ask the amount.

Deeper Questions

On Format

On Speed

On Cost

On Custom Fields

On Historical Data

The Red Flags

If your vendor says any of these, be very cautious:

A Technical Glossary (What They Actually Mean)

MARC (Machine Readable Cataloging)

Industry standard for bibliographic records. All ILS systems support MARC. If vendor won't export MARC, they're locking you in.

API (Application Programming Interface)

A technical interface that lets external systems read data from the ILS. Good vendors have APIs. Bad vendors don't (because they want you trapped). Ask: "Do you have a public API? Can we write custom scripts to extract data?"

Batch Export

A tool to extract large amounts of data all at once. Standard vendors have this. If they don't, you're extracting one record at a time, which is insane.

Field Mapping

How data from vendor system translates to standard formats. If vendor has a "field mapping tool," they've thought about data portability. If not, you're doing manual conversion.

Backup vs. Export

A backup is a raw copy of the database. An export is a translated version for other systems. Vendors often refuse to give you raw backups (for security reasons, but also to lock you in). You want standard exports, not proprietary backups.

What Actually Gets Stuck: The Uncomfortable Truths

Patron Checkout History

Many vendors store this in vendor-specific format. You can see it in their interface, but you can't export it. Libraries lose years of "who read what" data.

Prevention: During contract negotiation, demand: "Vendor will provide complete checkout history in standard format (CSV or database export) quarterly and at termination."

Custom Item Types

You created 47 custom item types (Book, E-Book, Audiobook, Microfilm, etc.). New system might not support some of these categories. You have to map them manually.

Prevention: Get a list of all custom item types in writing. Ask new vendor if they support them. If not, budget time for remapping all 45K items.

Circulation Policies

You have rules: "Adults: 3-week checkout, 25 item limit. Students: 2-week checkout, 20 item limit." These are stored as vendor-specific configurations. You can't export them; you have to manually rebuild them in the new system.

Prevention: Document your policies in writing NOW. When you migrate, you'll rebuild from this documentation. Budget 2-3 weeks of work.

Fine Calculations

If you have custom fine rules ("Books overdue 7+ days: $0.25/day. Audiobooks: $0.50/day"), you have to rebuild these in the new system. Often the new system calculates fines differently anyway.

Prevention: Document fine rules. Be prepared to simplify during migration (new system might not support complexity you've built).

Authority Control Data

Library of Congress authorities (names, subjects) are standard. But if you've created local authorities, these often can't be exported or don't make sense in the new system.

Prevention: Use LC authorities whenever possible. Minimize local authorities. If you have local authorities, document them and be prepared to lose some during migration.

Patron Deleted Records

You delete patron accounts (patrons who moved, deceased patrons, etc.). Once deleted, many systems can't recover them. If you need to keep historical patron records for auditing or analytics, you need to export them before deleting.

Prevention: Don't actually delete patron records. Mark them as "inactive" instead. Export archived patrons to a separate file quarterly for safekeeping.

Contract Language: The Escape Clauses You Need

Before you sign your next vendor contract, demand this language:

Data Portability Clause

"Upon contract termination, Vendor shall provide Library with complete, machine-readable export of all data including (but not limited to): bibliographic records in MARC21 format, patron records in CSV format, item records with all fields, transaction history, custom configuration, and any other data stored in Vendor's system. Export shall be provided within 30 days of written request, at no additional cost beyond standard contract terms."

Data Format Clause

"All exported data shall be in industry-standard formats (MARC21, CSV, JSON, or equivalent) compatible with standard library systems. Vendor shall not provide data in proprietary formats. If Vendor's system stores data in proprietary format, Vendor shall convert to standard format at Vendor's cost."

Custom Field Clause

"Any custom fields or local configurations created at Library's request shall be documented in writing with field names, definitions, and usage. Upon termination, Vendor shall provide complete export of all custom fields with data values intact."

Historical Data Clause

"Vendor shall maintain and provide upon request: complete checkout history (minimum 7 years), patron fine and fee history, holds queue history, and any other transactional data. Data shall be provided in standard format (CSV, JSON, or database dump)."

Performance & Availability Clause

"Data export shall be completed within 30 days of written request. Vendor shall provide incremental extracts (monthly) at Library's option to facilitate testing and validation before final migration."

Termination Clause

"Upon contract termination, for any reason, Vendor shall make all data available for export at no cost. Vendor shall not require payment of outstanding balances to release data. Library shall have minimum 60-day access to Vendor's system to complete data extraction."

If your vendor won't agree to these, that's a massive red flag. Vendors who protect data portability are vendors confident in their product. Vendors who create extraction barriers are trying to keep you locked in.

Technical Audit: A Checklist for Your IT Staff

If you have an IT person, have them audit your vendor's actual extraction capability:

Recovery Strategies: When Data Gets Stuck

You asked all the right questions, but your vendor still can't extract something you need. Now what?

Option 1: Hire Consultant to Extract

Consultant with database skills can sometimes extract data directly from vendor's database (with permission). Cost: $5-15K. May work if vendor is uncooperative but legally required to give you data.

Option 2: Rebuild Manually

For some data, manual rebuilding is faster than extraction battles. Circulation policies? Rebuild in new system (takes 1-2 weeks). Custom item types? Map them manually (takes 1-2 weeks). Transaction history older than 3 months? Accept you're losing it; start fresh.

Option 3: Delay Migration

If vendor can build extraction tool, sometimes they will, but it takes 3-6 months. Ask: Is delaying worth it? If checkout history is essential for your analytics, maybe yes. If it's nice-to-have, probably no.

Option 4: Legal Escalation

If contract guarantees data access and vendor won't provide, you might have legal recourse. Consult your library's attorney. This is expensive and slow, but vendors hate having lawyers involved.

Option 5: Accept Loss

Some data you'll lose. Historical patron behavior? Often gone. Deleted patron records? Often gone. Archived holdings from 2005? Often gone. Accept this and move on. Have clear conversation with your board: "We're losing X data. Is that ok?" Usually it is.

Prevention Checklist: Before You Migrate


References & Further Reading


Related Reading

Learn more about vendor relationships, contracts, and your data rights:

Filed under: Technology Leadership, Data Portability, Vendor Contracts, System Migration