Thursday, June 18, 2009 at 11:27 PM
Webmaster Level: All
We just added external resource loading to our Flash indexing capabilities. This means that when a SWF file loads content from some other file—whether it's text, HTML, XML, another SWF, etc.—we can index this external content too, and associate it with the parent SWF file and any documents that embed it.
This new capability improves search quality by allowing relevant content contained in external resources to appear in response to users' queries. For example, this result currently comes up in response to the query [2002 VW Transporter 888]:

Prior to this launch, this result did not appear, because all of the relevant content is contained in an XML file loaded by a SWF file.
To date, when Google encounters SWF files on the web, we can:
- Index textual content displayed as a user interacts with the file. We click buttons and enter input, just like a user would.
- Discover links within Flash files.
- Load external resources and associate the content with the parent file.
- Support common JavaScript techniques for embedding Flash, such as SWFObject and SWFObject2.
- Index sites scripted with AS1 and AS2, even if the ActionScript is obfuscated. Update on June 19, 2009: We index sites with AS3 as well. The ActionScript version isn't particularly relevant in our Indexing process, so we support older versions of AS in addition to the latest.


54 comments:
Great !
Thanks for sharing this valuable piece of information.
That's great news!!
Hi and great !
Could you send us example of Robots.txt text to block links ?
Stephane.
I notice the clickable links in flash videos such as YouTube embeds don't get crawled. Will this ever be changed? It seems like if someone feels a video is important enough to post the link to the source of that video would be completely relevant and should count as a link.
How does it work with dynamically loaded XML assets? i.e. Flash loads config.xml file (of which the URL is built into SWF movie) then depending on user interaction it dynamically loads other XML assets whose URL paths are stored in config.xml file.
Would they be indexed as well?
Thats good news. I was not using flash because of this reason in my cricket website http://www.thecricfanclub.com
But I can use it now :)
I think this is huge for flash.
Great news!
But and about action script 3 ?
Is it not correct interpreted and indexed yet ?
Hmmm, will the indexing result in the external file being the search result or the swf that is using the external file?
Good question from Tomec about indexing db connection xml or other undisclosed information.
Cheers, 1CallService.com
"Index sites scripted with AS1 and AS2"-So what about AS3?
Does this work with all version of Flash or just more recent ones?
Not much use if AS3 is not supported!
An example of a robots.txt file blocking some .swf and Flash related assets would be as follows:
User-Agent: *
Disallow: /website.swf
Disallow: /data/myData.xml*
Allow: /
In fact, I had to use a similar one a couple of months ago when my xml files gained more pagerank than my very own Flash site.
On the other hand, I didn't like the user accessing directly the swf files. My SWF files need the embedding parameters of the SWFObject in order to work properly. So nobody should get straight to the .swf, or she would think that it's a broken site.
That's why filtering swfs and external data files in robots.txt becomes so important.
Finally! Thks for the Feedback.
Thanks for the example E.S.V - this is good news! My flash site http://benrandall.co.uk/bridging.aspx might actually start appearing in search results now!!
Google is already indexing AS3 content using the Ichabob player. I think the article was implying that it is also indexing AS1 and AS2 content.
Now this is a great news as all the website I build use an external preloader to load my main content. Now that content can be indexed.
In general this is a positive step for Adobe and Google, but it's still a long way from having a Flash RIA be as penetrable by Google's crawler as HTML, CSS and JavaScript are. We at Socrata previously built our social data discovery application entirely in Flex/Flash, but in part due to the ongoing SEO challenges we rebuilt the site in HTML, CSS and JavaScript. Socrata is a site for people to find public data and datasets, so it was important that the underlying content be indexed by Google and other search engines. Since relaunching the site the SEO improvement has been substantial and was almost immediate.
Does it index deep content. Let's say that a user click on a flash link/button which in turn load some XML external content, does it get index?
Also, what happen if in my external xml content there are lot of content which is not meant for user display, would it get indexed?
@Tomek: Yes, the XML assets in your example could be indexed as well.
This is great! Thank you! :D
Great news. Now the matter is to hide unwanted contents from google...
I'm interested to know more about how it indexes URLs found in an XML file. For example, if a XML file contained URLs for external images, would the paths have to be absolute? Or does the indexing have the capability to follow relative paths using the parent document as the root?
Yes, the robots are crawling as3 now - I can verify that. Here's what I've experienced with Google indexing our flash stuff:
1. don't break apart your text - but it doesn't seem to matter if it's HTML text format or not.
2. keep all your "accessible" text in Movie clips - NOT graphic clips.
3. Google seems to have reduced some of the significance of flash text on home pages - previously Google was returning search results for the content of our swf file - even though that content was minimal on the page, and not directly related to the rest of the site content (client testimonials). Looks like that's been fixed.
3. I would ALWAYS make external links within a flash file absolute, including the http:// - unless you're pulling data back, like an mp3 file, or image. Those I would block with robots.txt
4. I'm sure they're filtering out the config.xml files, since that's included in every swf. But I'll test that out to be sure.
5. links in videos probably won't be crawled if it's in .flv format, like youtube. However, if you have a .swf loading flv's into it - any links in the swf should be recognized. I'm assuming you can "nofollow" those as well.
6. I still have NO intentions on building sites entirely in flash - since you can go ahead and say goodbye to any mobile visitors, which happens to be the biggest growth right now.
7. still, this IS great.
You can prevent google from indexing sensitive strings in your AS code by encrypting the strings either manually, or by using secureSWF's literal strings encryption feature.
Disclosure: I work for Kindisoft.
Wow! I can't wait to see how it changes searches on my photography flash site. I'm so tired of being on page 3-15 on google search
http://www.BorkgrenPhoto.net
There was this site g2000 and it has some content loading from xml data but it seems that google search didn't return the respective pages... E.g. g2000 ang mo kio hub but the following result was not given.. http://www.g2000.com.sg/ss09/#/store/2/1...
Does that mean that there is a syntax / structure I would need to follow?
thanks for sharing
What about AMF content?
Will deep linking with SWFAddress be incorporated
I knew it was coming but now I can tell my clients it's here, thanks!
OK so google has indexed text in our flash photography .swf, which is neat BUT the google link to said indexed content only brings up the .swf file - rather than loading the html file the .swf is embeded in? Is it just me or is this a bad way to do it? As now the .swf file loads oversized in the browser which means it scales up over 100% making the jpegs look awful. Is there something i can do to prevent this?
Do you make webservice calls and index the returning XML as well?
When you say "We click buttons and enter input, just like a user would," is that limited to Flash components? If I create my own drop down menu from scratch with AS and drawing objects, can your robots really access that?
And how do you decide what type of text gets entered? You guys haven't come up with some kind of AI that's eventually going to subjugate us, have you?
Really cute and all but what if you have a complete site in flash and it can "read" all the external files. What good does that do when one flash-file contains 80 pages and the content is put in an xml-file.
Does it then link to the flash-page? Because that does really no good for searches, then I still don't know which of the 80 pages contains the information I am looking for.
Or am I seeing this all wrong? Anyone who knows?
Great! Relevant! important! :)
Thanx!
Lets index it all.
Index flash as3 developers too ;)
dsaliberti
Nice one chiefs!
Let me ask this, this discussion is about indexing files within an swf. I'm looking for an answer for a similar question. If I embed my swf content inside an xhtml file, go to the header add the title of the site and regular keywords wouldnt Google see it?
I'm also interested about the AMF content? is it indexed too?
I have a similar question others. Are *all* links crawled?
We have a link in the context menu that's reached by a navigateToURL call in a ContextMenuEvent handler, and we are not seeing it crawled.
Is it possible it's not being crawled/indexed because it's not a direct part of the
display stack.
here is a full flash site who use the SWFAddress script and change dynamic the url and the title of the page. you can bookmark any page u want: www.webdesignflash.ro .It also load the text from the HTML index file and pass the w3c validator test :)
(Please remove my previous comment. I forgot to check "Email follow-up comments.")
I have a Flash/Flex site that Google can't seem to crawl. Does anyone know why this might be? The site is: www.pkavalues.com.
If you Google "pkavalues," then we are there, but there is no content in the search result except for the title (which is in HTML). I was hoping to see the text from the "Welcome" tab, for example.
More details about the site:
--Made in Flex Builder
--Targeting Flash Player 10
--Some text is dynamically loaded from XML.
--Some text is done using Flash/Flex's TLF (Text Layout Framework).
Hey Geoffrey,
We try our best to crawl Flash content but the results can sometimes be less than ideal. You are only seeing a title in the search results for your site because that's the only bit of HTML text that you have outside of your Flash content. You could add a Meta description element to offer more information in HTML. You could also add some other text that's not a part of your Flash content. Just doing this should improve the snippet you see associated with your site in the search results. You might also want to check out the 'Fetch as Googlebot' feature in Webmaster Tools to get a better idea of how your site appears to Google.
Is there a tool that shows me the text and order of what google indexes my flash file? Like the webmaster tools that do this for html?
I find this great, however, as an interface developer, I'm sure Flash is never going to be at the same SEO degree as a fully XHTML/CSS site.
Flash wasn't made at this purpose, and unless things change, it will be always as that.
A well design structure must have headers, strongs for important words, acronyms, paragraphs. Also, ALT at images, TITLE at links.
Plus, many websites done in Flash doesn't have a URL for each section (example: a specific URL for an article you reading).
That makes tough for share, and share is one of the "must haves" nowadays.
I see many flash frameworks saying they have SEO, while it simply generates a simple XHTML page with some lists and headers. And many times, just redundant content, which is against SEO.
I don't think Flash should ever be well threated by search engines as well done structured sites are. That's simply not correct.
Flash is often slow, and often reinvent the wheel with menus that came from Mars - they just screw the pattern that people are used to.
That's good for some marketing campaigns and hotsites... But there is too much abuse over the internet related to it.
I find that terrible. It's no good for users at all.
Usability and acessibility goes down many times because people just think Flash is magical.
The above post is simply wrong and misinformed. It is possible to match the optimization level of an HTML website in a full flash design. Having HTML alternate content is completely white hat compliant, as long as it matches the text in the swf. Almost all of the text in my SEO'd Flash websites are taken directly from the alternate content with excellent results. I also incorporate dynamic URLs so each page has its own unique and bookmark-able address.
While there are examples of poorly designed flash sites, there are infinitely more examples of HTML sites that should simply be pulled offline. The stigma that Flash sites cannot be indexed and HTML sites are better at SEO than Flash must be obliterated. It is 2010, and a significant percentage of the population uses a broadband connection, making well designed Flash sites fast and easy to use. Since the design and user interface possibilities exceed anything HTML can offer, and by the end of the year, most mobile devices will be able to display Flash, it is time to accept the reality that Flash sites have grown into a valuable commodity and will only continue to grow in size.
Examples of what I am talking about can be seen here:
http://willworkforfilm.com/#/web-design
Very interesting read. As this particular post was created in summer 2009, I am certain progress has significantly increased on the Google, Yahoo! and Microsoft front lines for the indexing capabilities of Flash, Flex and SWFObject / SWFAddress content / Websites as I have read and commented on several blogs pertaining to this one, big issue. As this conversation broadens with more and more people asking the same relevant question "Can my Flash Website be indexed?", the subject attains more merit with newbee Designers / Developers as with seasoned professionals given us more information to research and cross exam for Web do's and don'ts trying to find the best solutions for our clients, end-users and our own edification.
With new development initiatives from technology icons everyday we can look forward to more creative solutions for our SEO woes. Having said that, as Designers and Developers we must also do our own due diligence and research to find better solutions ourselves to this question and help others in our profession to advance rather than negating the information as whose right and wrong in a blog. Grant it, both sides of this growing conversation has merit, but we must continue to work together to enhance the common goal: usability, functionality and great design.
As the curiosity of our end-user and clients continue to grow in ways that were never seen in the past, so must our knowledge in our respective fields continue to expand in order to meet these needs to encompass these engrossing ideas for new business, new services and new products. As Internet professionals there is one thing we should all be able to agree with: evolution is inevitable as it forces us to evolve with the changing times or slowly get left behind. Don't become a fossil by holding onto the ideology of the past, become the solution by using what you have learned to better improve our current processes.
Again, very good read.
Regards,
Rob Busby, robbusby.com
A question:
Im developing a full flash site and I need to Optimize it for search engines. I thought to do that:
the swf file gets ALL the textual data from external XML (or even HTML) files. Do you think this is enough?
Are there any good practices to best create xml file for indexing content?
I.E:
- use tags instead of attribute?
- use absolute path instead of relative path?
Actualy I've an xml with separate dir path:
[items main_dir='asset/']
...
[item item_dir='client01/']
[image image_dir='images/']image.jpg[/image]
[/item]
...
[/items]
so in As3 I build the link: var image = main_dir + item_dir + image_dir + "image.jpg";
Google indexes this type of builded links?
Can google or adobe give us a standard for xml files and attributes?
I am using mySQL/PHP to generate XML files that flash uses as site-structure. I would like to optimise this for SEO.
My client is still not getting any indexing on his site. I created all the type in an external XML doc which shows up fine in the site but apparently is not crawled.
What am I doing wrong?
External XML content not associated to the parent SWF file
Related to the statement: "When a SWF file loads content from some other file - whether it's text, HTML, XML, another SWF, etc. - Google can index this external content too, and associate it with the parent SWF file and any documents that embed it."
Subject website: www.mararu.com
Subject error: www.mararu.com/xml/main.xml
Synopsis: At searches for words in our site (such as mararu + potomac), the search result is www.mararu.com/xml/main.xml, which means that the external resource is indexed by Google but it is not associated with the parent SWF file.
This behavior started to happen at approx a month after we launched the site, I'd say about 3 weeks ago. Prior to this, all search results were correctly directed to www.mararu.com (search results).
The mararu.com domain is few years old.
We did not use any SEO company.
Most of the text of the website is in a folder, xml/main.xml. This text has been indexed.
The text of the Newsroom section is in the public folder, in an xml file. This text hasn't been indexed.(?)
Any suggestions? Thanks so much in advance.
Hi everyone,
Since over a year has passed since we published this post, we're closing the comments to help us focus on the work ahead. If you still have a question or comment you'd like to discuss, free to visit and/or post your topic in our Webmaster Central Help Forum.
Thanks and take care,
The Webmaster Central Team
Post a Comment