Fastest way to process 150 MB of data

Hi,

This is purely client side should work without internet…

I have just 150 MB of data which can be splitted in to different HTML files - say 1,300 files. or text files…

I need to provide a type-ahead search to search any keyword in the 150 MB of the files and it should display beautifully with highlighted for easy navigation - previous, next and result should return less than a second. On navigating the HTML file from the search, it should highlight the same keyword in the entire HTML file…

Very simple use case…

This has to be hosted in the server as well…Offline and Server version as well…

What is the best way to achieve this?

thanks.

Try to convert the data into a database table because database searching and combined with AJAX to handle keyboard input is fast:

Online Demo - Source code available at the foot of the page

2 Likes

I don’t like to do it in the database…I want to do it in file or javascript itself…Please advise without database…

Fast search without a database? You demand the impossible.

1 Like

I don’t think it is impossible…Processing of 100 MB via javascript should not be an issue…User has to import the files…Javascript to read, tokenize, index and has to bring up the search results…I have tried lunrjs…However it tooks 30 seconds…Also, the browser gets hanged when i choose huge files…

Notepad++ takes huge time every search…Also, no type-ahead…Also, we cannot customize it quickly…So let’s better try out for best options

And what database is the editor working with?

I don’t know, but AFAIK different browsers may use different JavaScript engines and can have differences with both memory limit and script timeout.

If you know your target users you can stay just below their limits. Depending on efficiency, many may be willing to wait for the script to run.

If things are going to always and ever be well below memory and time limits you could get away with less efficient JavaScript, the inefficiency will be negligible. But if you think there is a possibility you may at some point want bigger and faster then IMHO at least getting a database solution in the ready is a good idea.

1 Like

I would really… REALLY recommend NOT sending that much info down the wire. Thant’s gonna be a major performance hit on first load.

1 Like

This is 100% not an issue for the client. If you’re trying to push this much data to the client at one time then you’re doing it wrong and should figure out another way to accomplish what you’re trying to do, and there is always another way.

Definitely not impossible, I mean look at grep/vim/emacs and to a lesser extent Sublime or VSC.

Out of scope? Probably. It’s not an easy thing to do by any means.

1 Like

What you’re basically going to be talking about here is preprocessing and indexing the data.

You’re probably going to need to do some investigation on things like: What’re the common words in the files? What keywords are people likely to search for? (“the” is going to be the most common word in the files, but people are unlikely to be interested in keyword searching for ‘the’…)

1300 files? Why in the world would you do this?

You do realize you are asking us to tell you how to do this wrong don’t you. There is no valid reason you should be taking this approach. At minimum you should be using sqlite.

You are asking for the best options yet you are refusing it for some reason.

1 Like

It would help if we knew:

1, the form of the “150 MB of data”
2. operating system
3. your programming language experience

Edit:
Investigate Google’s Search Application for the online application which can be limited to a specific URL or path,

This topic was automatically closed 91 days after the last reply. New replies are no longer allowed.