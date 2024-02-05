I find it frustrating when a website locks up a dataset behind a query-based searchbox. It makes sense only when the data are proprietary, and then only if the developer applies sufficient security precautions to prevent mass downloads.

To be clear, when you provide a dataset a .txt or .pdf file is excellent for human eyes to read, but a .pdf requires OCR analysis to extract the data that the developer already had in a perfectly good database.

I find that a pipe-delimited .csv file is the simplest format to receive data. For my websites, for example, a cemetery burial list of perhaps 5000 persons by last name, given names and other details can easily be added to a compilation of many cemeteries, where a person can go and look to learn that Uncle Joe died in Montana and was buried there. The researcher wouldn’t already know which cemetery website to visit. Let the cemetery run the cemetery and the cemetery website, and let me and other web developers download the burial lists and compile them into big databases which search engines will crawl.

It makes sense to trap the data behind a query-based search box only if they are proprietary, in which case, the query-based search box approach is useless to protect the data unless there are sufficient safeguards. Some give the whole dataset when the end-user simply hits the SEARCH button without entering any search terms.

When the data are public, it is best to provide one link to download a simple .txt file or .pdf file for human eyes, and another link to download the same data as a pipe-delimited .csv file for anybody who wants the whole dataset. A search box can help humans who want specific records, but remember that a Google-based search box will yield only the results that are crawled and indexed by Google. News media should provide a table of contents by the date the story ran.