April 12th, 2021 • Featured, Toolbox, Quill Archives
SPJ Journalist’s Toolbox Tool of the Month: Scraping a .PDF
I loathe .PDFs of public records with the power of a thousand suns. They’re a tease. They’re full of data tables but useless to most data journalists in the .PDF format. And government officials love to share them with us because they know a .PDF
Government websites love to bury data in tables on web pages. Why? It satisfies legal requirements for making document public under sunshine laws, but it renders the data useless. You can’t sort or filter the data to look for trends, do math calculations to find rates and averages, and other things journalists need to find stories.
Reporters hate transcribing notes and they often ask me during newsroom training what tools work best. They want speed and accuracy with the transcriptions, and they want it free (or very cheap). I’ve listed many tools on the Toolbox’s Transcription Tools page, but here are my three favorites for speed, use and cost: Otter.ai:
Editor’s note: This is the first of what will be monthly posts about how to use digital and data tools on Journalist’s Toolbox. Check back each month for new tools, tips and tricks. Google launched its Dataset Search tool in November 2018 to help researchers locate data that is freely available for use.
Victor Hernandez preaches the gospel of newsroom productivity, whether he’s working with his reporters in the Crosscut newsroom in Seattle or training journalists at conferences around the country. Hernandez’s philosophy is simple: Think trends and not tools when finding digital resources that can make you more productive.