Does google index .txt files?


It sounds like you’ve done pretty much everything you can right now. Including those txt files in the sitemap is great, but as it has been pointed out numerous times now, your visitors will see just the text file in the search results and not the Flash site itself.

I think the best way to move forward would be to find some sort of scripting language that can read those text files and write the content into the alternate content tags, even if that means finding a new hosting environment. There are loads of them out there for just a few dollars per month that will support this.

Best of luck!

Seems you’re finally seeing the problem with Flash and search engines… Oil and Vinegar. They don’t mix.

Despite the fact that Google, Yahoo!, and MS were given the specs for reading Adobe binaries a year and a half ago, Flash still ranks for crap. Flash sites always have been, are, and will for many years to come be almost 100% dependent on inbound links in order to rank. Even if the content of the .swf can be read by Google, the content will be almost worthless in helping the site rank.

Matt Cutts said to me in person at Pubcon 2008 shortly after the announcement by Adobe that they had given up the keys to the castle, that it will be 10 years before Flash sites rank even relatively well based on it’s content, and that an HTML site will ALWAYS rank better than an equivalent Flash site.

The search engines are built for indexing HTML, NOT some proprietary 3rd party markup. Their crawlers, indexers, ranking algorithm, PR algorithms, the actual index itself, their tools, their backend systems, etc. have ALL been built around HTML, an open standard not owned by any 3rd party. They are good at indexing HTML as they have spent the last 10-15 years perfecting it (depending on the engine).

Flash has lots of short comings when it comes to search engines… the biggest of which is that a site written as a single .swf has ALL of its content indexed under a single URL. Imagine if your Flash site has 100 “pages” and a 3 level deep navigation system. All of that content gets indexed under the site’s home page URL… keyword density is about 1/100th of what it could be if those were 100 HTML pages. You only get 1 <title> for your home page instead of 100 <title> elements for the 100 HTML pages… You don’t get to have <h1>s and <h2>s where you could use 100 <h1>s and even more <h2>s for your 100 HTML pages.

Building a 100 page site entirely in Flash is like taking all of the content from a 100 page HTML site and cramming it into a single home page. It’s terrible from an SEO perspective. It’s always easier to get 100 pages to each rank for a different keyword phrase than it is to get a single page to rank for 100 keyword phrases.

And when you do manage to get a Flash site to rank it is also terrible from a user perspective. Imagine if you did manage to get it rank for a keyword phrase for a page 3 levels deep on your 100 page Flash site. What happens when it shows up in the SERPs and someone clicks on the link? They get taken to the home page NOT the page where the relevant content is located. Now the user has to click around the entire site maybe never finding the page which they are looking for. Bad user experience.

Yes, you can jump through all kinds of hoops to try to get a flash site to rank… maintain a 2nd version in HTML that you show the engines and those without the plugins… but that is a maintenance nightmare. Twice the work to develop and maintain. You could make 100 HTML pages that each load up their own .swf with just one page of content and leave all global, left, and footer navigation in the HTML… only the “content” of the pages would be in Flash… Again… why? Now any contextual links you have in the Flash are not going to get counted as links.

99% of all Flash sites that I have seen could be written and rendered almost identically in HTML. The reason sites get written in Flash is because designers/developers prefer to work in it. It’s fun! It’s cool! Sorry but it sucks for SEO and it sucks for users.

Anyone who wants a site that ranks in the search engines should avoid Flash like the plaque. If you or your client have no desire to “rank” and only want traffic that you get by direct type-in due to print advertising or business cards then have at it. Personally, I think that every designer/developer out there that even suggests Flash as a platform for building a site is doing their client a huge injustice.

PS: Including those txt files in your sitemap is worthless. Even if they index them (which they likely will) you’ve created again a terrible user experience. They are going to be shown a crappy text page with no kind of navigation to get to the site. It’s like getting a popup to rank.

The absolute BEST thing you could do for your client is scrap the Flash, and rewrite the entire site in HTML IMO.

Canonical - you’re missing the point. The webmaster of this site isn’t arguing that the Flash website should rank as well as an HTML website. The goal is to simply have the content of the Flash site indexed.

Your comment about maintenance is only valid if you’re not dynamically loading content into the Flash file, which this website is doing. In this case, you only maintain one version of the content which resides in a .txt file.

This user has very few options in this situation which are stipulated by the client. The fact of the matter is that while there might be better solutions out there, some of those solutions simply aren’t a choice right now for this client. That being said, we’re working to see if we can make the best of the current situation rather than scrapping the entire project as you suggest.

Which is why I said that an HTML version of the site as an alternative would be much better placed, Flash can easily be placed over the HTML to ensure that the desired effect is given but for those without Flash and for search engines, you’ll get a much more semantically rich and compliant page (which links to the index page rather than text files), it’s about using the best situation for the job and text files linked using a sitemap is pretty much reaching for the bottom of the barrel as it offers no context, semantics or real-world resolution for the non-flash users. It’s partly why Flash will never be as indexable as HTML, there’s no semantic relevance to content used within flash files. If it were my project (or client) I would do whatever best meets the visitors needs, and text files are a poor substitute. :slight_smile:

There is not even a “poor” solution for the client’s problem short of:

  1. converting the entire site to HTML and ditching the Flash completely (my suggestion).
  2. converting the entire site to HTML and adding a link on the home page to another page that calls the current SWF for those few who will want to navigate it in Flash.
  3. converting the entire site to HTML pages - keeping the navigation in HTML - and breaking the current single SWF into a separate SWF for each page that only renders the body of the page.

Short of those solutions, the client is better off leaving their site as is… without putting content in text files… and simply depending on link building to get them to rank.

In all likelihood, even if you DID put all of the dynamic content in .txt files, it’s not going to rank if the phrases are even low competition phrases (they’ll need to be NO competition phrases) because those pages are going to have only a single link from the sitemap. You don’t want to build links to a .txt file on other sites. That would be an even worse user experience.

The pages won’t have a <title>, won’t have an <h1>, no <h2>s, they won’t have phrases that are emphasized with <b> or <strong>, etc. ALL of the on-page elements that help the search engines figure out what a page is primarily about will be missing. AND you’ve created a terrible user experience as well if you do manage to get them to rank.

If you cannot implement 1 of the 3 methods above then you are better off simply leaving the site AS IS and doing what all purely Flash sites have to do to rank - spend all of your SEO efforts for the site building inbound links from other sites with link text containing the keyword phrases you are targeting - preferably from sites that are relevant and/or high PR sites.

Those are basically your 4 options. I’m sorry… it sucks… but some times the ONLY answer is it can’t be done without completely rewriting your site because going with Flash was a terrible idea to begin with.

Well my initial worry was that Google would be indexing text files in the XML sitemap which wouldn’t be visible within the main page (and therefore not intended for users) and may treat it as spam, perhaps as it’s just a text file that won’t occur but it doesn’t feel right (or feel like it’s going to do the site justice) to “feed” the content in the side-door and have it indexed as un-related breadcrumbs of the main website (which is how it will appear). :slight_smile: