Making 10M government PDF documents searchable – FlowingData

Government organizations love to distribute documents as PDF files. They are easy to forward and to print. The problem is when you want to find and access them later among millions of other files. GovScape, a research project between the University of Washington and Boston University, provides a search interface through the End of Term Web Archive’s 2020 crawl.

The code for GovScape is open source and available on GitHub. I have a feeling such a tool will grow more important going forward.

Source link

Hot this week

Monument, gravestone makers deal with tariffs and cremations

Strathroy | Istock | Getty ImagesFor nearly a century,...

Snack Friday Sale + the Cutest Foodie Gift Guide Ever (shop small!)

This holiday season… we’re shopping small, buying thoughtful gifts...

Nigeria’s kidnapping crisis: ‘Too scared to speak’

Mayeni JonesandKyla Herrmannsen,in Minna, NigeriaEPA/ShutterstockHigh on the agenda -...

Topics

Parents: Keep Out!

If you’re a parent or a teacher, you’ve probably...

Ferrari Design. Creative Journeys 2010-2025

The exhibition hosted at the Turin MAUTO (Museo Nazionale...

Ticketmaster Launches Black Friday Sale

Ticketmaster has launched a special Black Friday and holiday...

Buttery Cheddar Pecan Crackers – Joy the Baker

Welcome, friends, officially to the holiday season. The big...

Sri Lanka death toll from floods and landslides reaches 123 | Sri Lanka

Torrential rains and floods triggered by Cyclone Ditwah have...

Type 26: Auto Fabrica’s Elegant Honda CB750 Café Racer

The English workshop Auto Fabrica delivers another show-stopper. This...
spot_img

Related Articles

Popular Categories

spot_imgspot_img