Import what your business actually has.
Spreadsheets, PDFs, exports, and structured data, not pre-cleaned, not pre-filtered. Northwind handles the reality of how teams work.
The shapes your data shows up in.
Northwind treats each input as a first-class source. No upstream cleaning required before insights can run.
Documents
PDFs, contracts, reports, and scans. Text and tables are extracted and kept linked to the source page.
Spreadsheets
Excel and Google Sheets. Multi-tab workbooks are preserved, and worksheets can be modeled as joinable sources where supported.
Data exports
CSVs and structured files dropped from your existing systems. Headers, types, and encodings detected automatically.
Multi-sheet handling
Workbooks stay grouped, but sheets are addressable independently, so joins can target any one sheet.
Schema detection
Types are inferred and normalized before insights run, so downstream work isn’t guessing at column shapes.
Quality flags
Anomalies, gaps, and outliers are surfaced early, before they get baked into a number you have to defend.
Encodings & locales
Character encodings, date formats, and number locales are detected so columns line up cleanly across sources.
Source provenance
Each input keeps a link back to where it came from, so downstream surfaces never lose track of the origin.
Real inputs, not idealized ones.
Most ingestion tooling assumes the input is already clean: a single tab, well-typed columns, no merged cells, no headers that span two rows. Real spreadsheets do not look like that. Real PDFs are worse.
Northwind is built around the assumption that the file you have is the file you have. Workbook with thirty tabs and a "summary" sheet that pulls from twelve of them? That structure is preserved. Contract with a clause buried under a scanned page? It is extracted, anchored to a page reference, and made queryable.
You should not have to choose between honoring the source and getting structured data out of it. Northwind keeps both.
- Mixed-type columns: typed without forcing the data.
- Merged cells and multi-row headers: preserved.
- Multi-tab workbooks: each sheet stays addressable.
- PDFs with tables and prose: both extracted, both linked back to pages.
- Encoding and locale quirks: detected, not assumed.
Try it on your worst spreadsheet.
The kind of file you would normally rebuild before showing anyone. Drop it in and see what comes out the other side.