Thing 6: AI document and data analysis

A 90-page strategy document lands in your inbox the day before a meeting. A spreadsheet of survey responses needs summarising for a report due on Friday. Three different policy documents need comparing to find where they contradict each other. A funding application requires you to digest a stack of background research you haven't had time to look at properly.

You know the drill. You skim. You search for keywords. You read the executive summary and hope it covers the important bits. You open the spreadsheet, stare at it for a while, and then write something based on the columns that seem most relevant. It works, mostly, but you're always aware that you might be missing something buried on page 47 or hidden in a column you didn't think to check.

This is one of the areas where AI is most immediately useful. You can upload a document to a chatbot and ask it what the document says about a specific topic. Ask it to summarise the key findings. Ask it to compare two documents and list the differences. Ask it to pull out every mention of a particular organisation or budget line. The AI reads the whole thing, instantly, and answers based on what's actually in the document rather than its general training data.

How document analysis works in practice

The basic process is straightforward. You upload a file to a chatbot (ChatGPT, Claude, and Gemini all support this) and then you ask questions about it. The AI processes the document and responds based on its contents.

What makes this different from asking a chatbot a general question is grounding. When you ask ChatGPT "what are the main challenges facing social care in Scotland?" without uploading anything, it answers from its training data: a general, probably decent response based on what it absorbed from the internet, but not tied to any specific source. When you upload a particular report and ask the same question, the AI answers based on what that report says. It can point you to specific sections, quote relevant passages, and distinguish between what the document covers and what it doesn't.

This grounding is what makes document analysis so much more reliable than general chatbot queries for professional work. You're not asking the AI to make things up from memory. You're asking it to read something specific and tell you what it found.

What you can upload

The major chatbots accept a range of file types, though the specifics vary slightly between tools.

PDFs

The most common use case. Annual reports, policy documents, research papers, board packs, consultation responses: anything that arrives as a PDF can be uploaded and queried. The AI handles text-based PDFs well. Scanned documents (essentially images of pages) are trickier. Some tools can read them using optical character recognition, but the results are less reliable.

Word documents and text files

Spreadsheets

CSV and Excel files are where things get interesting. ChatGPT can run actual calculations on your data using its code interpreter. It writes and executes Python code behind the scenes to analyse your spreadsheet, create charts, and identify patterns. Claude handles spreadsheet data well in conversation, summarising trends and answering questions about the content. Gemini integrates with Google Sheets directly if you're working within the Google ecosystem.

Images

What you can ask

The real value of document analysis isn't just "summarise this." It's the ability to interrogate a document conversationally: asking follow-up questions, drilling into specific sections, and getting the AI to do the kind of careful reading that you might not have time for yourself.

Some examples of the kinds of questions that work well:

"What does this report say about funding for early intervention services?"
"Summarise the methodology section in two paragraphs, in plain English."
"What are the three strongest recommendations in this document? Quote the relevant passages."
"I need to brief my manager on this. What are the five most important points she needs to know?"
"Compare sections 3 and 7. Do they contradict each other on staffing requirements?"
"Pull out every statistic mentioned in this document and present them in a table."
"What questions does this report leave unanswered?"

For spreadsheets, the questions can be more analytical:

"What's the average response time by region?"
"Which category had the biggest change between Q1 and Q3?"
"Are there any obvious outliers in this data?"
"Create a chart showing the trend over time for the top five categories."

Notice the pattern: the more specific your question, the more useful the answer. This is prompt engineering applied to document analysis, exactly the same principles from Thing 3, but directed at a specific piece of content rather than a general topic.

The tools for the job

You don't need a specialist tool for most document analysis tasks. The chatbots you've already been using can handle it. But they have different strengths, and there's one dedicated tool worth knowing about.

Claude

Particularly strong with long documents. It can handle files up to around 200,000 tokens (roughly equivalent to a 500-page book) in a single conversation. It's good at careful, structured analysis and tends to produce well-organised summaries. If you're working with lengthy reports or need to compare multiple documents, Claude is often the strongest choice.

ChatGPT

ChatGPT has the edge when it comes to data and spreadsheets, because its Advanced Data Analysis feature actually executes code to process your numbers. If you upload a CSV or Excel file, it can produce genuine statistical analysis, create visualisations, and run calculations that go beyond what a language model can do with text alone. For document analysis, it's solid and versatile.

Gemini

Gemini integrates with Google Drive, which means you can reference files you've already stored in your Google ecosystem without downloading and re-uploading them. If your working life is built around Google Workspace, this integration can save steps. Its long context window (up to a million tokens) means it can handle very large documents.

Google NotebookLM

Worth a separate mention, even though you'll explore it in more depth in Thing 7. While the chatbots above work with individual documents you upload into a conversation, NotebookLM lets you build a persistent collection of sources (multiple PDFs, web links, YouTube videos) and query across all of them. Think of it as the difference between handing someone a single report to read and giving them a whole filing cabinet. For one-off document analysis, a chatbot is often faster. For ongoing research or projects where you're working with multiple sources over time, NotebookLM's approach is more powerful.

ChatPDF

A simple, free tool at chatpdf.com that does exactly what the name suggests: you upload a PDF and chat with it. It's more limited than the major chatbots, but its simplicity is the point. No account required for basic use, no distractions, no feature overload. If all you want is to quickly query a PDF without signing into anything, it's a useful option to know about.

Working with your own documents

Here's something important that applies to this Thing more than most: be thoughtful about what you upload.

When you upload a document to a cloud-based AI tool, that document is being sent to the provider's servers for processing. For personal documents, publicly available reports, or content you've created yourself, this is straightforward. But for documents containing sensitive, confidential, or commercially protected information (internal strategy papers, client data, HR files, financial records) you need to think carefully.

The privacy picture varies by tool and by tier. As a general rule, paid tiers offer stronger data protection commitments than free tiers. Some providers on free tiers may use uploaded content to improve their models; paid tiers typically don't. Enterprise agreements go further still, with specific contractual protections around data handling.

Privacy reminder: for this programme, we'll always ask you to work with documents you're comfortable sharing: publicly available reports, content you've created yourself, or sample data. We'll never ask you to upload actual work documents that might contain confidential information. That's a boundary worth maintaining in your wider AI use too, at least until you've checked your organisation's policy on AI and data handling. We'll explore this topic properly in Thing 17.

The supervisor mindset, continued

Thing 4 introduced the supervisor mindset: the idea that when you use AI for research, you're reviewing delegated work rather than doing the research yourself. Document analysis is where this mindset really earns its keep.

AI document analysis is impressive, but it's not infallible. Here's what to watch for.

Selective reading

When you ask an AI to summarise a long document, it has to decide what's important enough to include and what to leave out. Those decisions are based on patterns in language, not on understanding your priorities. It might emphasise the section with the most detailed data while glossing over a brief but important caveat buried in a footnote. If the summary will inform a decision, check that it hasn't left out something that matters.

Confident paraphrasing that subtly shifts meaning

AI is very good at rephrasing. Sometimes too good. A careful qualification in the original document ("the evidence tentatively suggests") might become "the evidence shows" in the AI's summary. These shifts are small but they can matter, especially in professional contexts where the difference between "suggests" and "shows" has real implications. When precision matters, check the AI's paraphrasing against the original.

Struggling with structure

Documents that are well-structured with clear headings, numbered sections, and logical flow tend to produce good AI analysis. Documents that are poorly structured, repetitive, or inconsistent in their formatting can confuse the AI, leading to summaries that miss sections or conflate different points. If a document is messy, your questions may need to be more specific to compensate.

Hallucinating beyond the document

This is rarer with document analysis than with general queries, because the AI is grounded in a specific source. But it can still happen, particularly if you ask questions that go beyond what the document covers. The AI might answer from its general training data rather than saying "this document doesn't address that." Getting into the habit of asking "is this based on the uploaded document or your general knowledge?" is a useful safeguard.

None of this means document analysis is unreliable. It means it's a tool that works best when used with the same critical eye you'd apply to a summary written by a human colleague: helpful, probably mostly right, but worth checking on anything important.

Resources to explore

ChatPDF

Free, no sign-up required for basic use. Upload a PDF and start asking questions. The simplest way to try document analysis if you want something quick and focused.

Try it free

Claude

Strong with long documents and careful analysis. Free tier available. Upload files using the attachment button in the conversation interface.

Visit Claude

ChatGPT

Particularly strong with spreadsheets and data analysis. Free tier available, though some features (including Advanced Data Analysis) may be limited.

Visit ChatGPT

Google NotebookLM

Free, works best with collections of sources rather than individual documents. Worth bookmarking for Thing 7.

Try NotebookLM

Field Guide to AI: document analysis comparison

Notes on how the major platforms handle document uploads, with regularly updated information on file type support and size limits.

Read guide

Activity: interrogate a report

45–60 minutes Any chatbot (ChatGPT, Claude, or Gemini)

You're going to upload a real document to an AI chatbot and put it through its paces: asking questions, testing the quality of the answers, and building your instincts for when document analysis is reliable and when it needs checking.

Find a document to work with. You need a PDF report of reasonable length, ideally 15 pages or more, so there's enough content for meaningful analysis. Don't use documents from your workplace that might contain confidential information. Instead, use a publicly available report. Some suggestions:
- A report from a charity or public body in your sector (most publish annual reports, strategy documents, or research papers on their websites)
- A government consultation document or policy paper from GOV.UK
- A research report from an organisation like the Joseph Rowntree Foundation, Nesta, the King's Fund, or similar
- An audit or inspection report from a regulator in your field
Choose something relevant to your professional interests if possible. You'll be in a better position to judge the quality of the AI's analysis if you have some knowledge of the subject matter.
Upload and explore. Upload the PDF to your chosen chatbot. Then ask it at least five questions about the document's contents. Aim for a mix of question types:
- A summary question: "What are the three most important findings in this report?" or "Summarise this document in 200 words for someone with no background in this topic."
- A specific factual question: "What does the report say about [a particular topic or section]?" or "What statistics does it cite about [something mentioned in the report]?"
- A comparison or analysis question: "Does the report identify any tensions between [two themes or recommendations]?" or "What evidence does it provide for its main recommendation?"
- A practical application question: "Based on this report, what three things should a team leader in [your sector] do differently?" or "What would be the strongest argument against the report's conclusions?"
- A question that tests the boundaries: Ask about something you're fairly sure the document doesn't cover, and see how the AI handles it. Does it say "the document doesn't address this" or does it start improvising from its general knowledge?
Verify and evaluate. For each of the AI's responses, go back to the original document and check. This is the most important step, the one that builds the skills you'll rely on every time you use document analysis professionally. For each response, assess:
- Was it accurate? Did the AI correctly represent what the document says?
- Was it complete? Did it miss anything important that the document covers on this topic?
- Was it appropriately grounded? Did it stick to the document's content, or did it slip into general knowledge without telling you?
- Would you trust it? If you'd used this summary in a meeting or a briefing, would it have served you well?
Rate each response on a simple scale: fully accurate, mostly accurate with minor issues, or contained significant errors or omissions.
Try the spreadsheet angle (optional but recommended). If you want to explore data analysis as well, find or create a simple spreadsheet to work with. You could download a publicly available dataset (the Office for National Statistics, data.gov.uk, and many charities publish open data) or create a simple fictional spreadsheet; for instance, 20 rows of made-up survey responses with columns for date, region, satisfaction score, and a text comment. Upload it to ChatGPT (which handles spreadsheet analysis best) and ask a few questions: "What's the average satisfaction score by region?" "Are there any patterns in the data?" "Create a chart showing the distribution of scores."

Privacy reminder: use publicly available documents or content you've created yourself. Never upload actual work documents that might contain confidential information.

Your output: a document containing the name and source of the report you analysed (include a link if it's available online), your five or more questions and the AI's responses (copied or screenshotted), your accuracy rating for each response with brief notes on what was right, what was wrong, and what was missing, and a short reflection (a few paragraphs) on what this exercise taught you about the reliability and usefulness of AI document analysis: when would you trust it, when would you check it, and how might you use it in your work? If you did the optional spreadsheet exercise, include your questions, the AI's responses, and any charts it generated.

Why this matters

Document analysis is one of the most immediately useful AI skills in professional work. The ability to quickly interrogate a long report, pull out what you need, and check the AI's work is something you'll use regularly, probably more than you expect. This activity gives you structured practice in a safe context (a public document you've chosen yourself), so you build confidence and critical judgment before applying these tools to anything higher-stakes.

The verification step is the most important part. Anyone can upload a document and get a summary. The professional skill is knowing how much to trust that summary, and this exercise is where you start developing that judgment.

Claim your Open Badge

Once you've completed the activity, you can submit your output as evidence for your Thing 6 badge via cred.scot. Your submission should include your questions, the AI's responses, your accuracy assessment for each, and your reflective commentary.

Thing 6: AI document and data analysis

Submit your questions, the AI's responses, your accuracy assessment, and your reflection as evidence to claim this badge via cred.scot.

Claim now

How document analysis works in practice

What you can upload

What you can ask

The tools for the job

Working with your own documents

The supervisor mindset, continued

Resources to explore

Activity: interrogate a report

Why this matters

Claim your Open Badge

What's next