Solving SEO with Headless Chrome (Polymer Summit 2017)

Ad Blocker Detected

Our website is made possible by displaying online advertisements to our visitors. Please consider supporting us by disabling your ad blocker.

SAM LI: Hi, everyone.

I'm Sam Li, and I'm anengineer on the Polymer team.

If you managed to pick up on myaccent in the last five words, I am indeed Australian,and so honored to be followed up by Trey,a fellow Aussie, as well.

Prior to joining this team, I'dworked on the beloved Chrome DevTools.

One of my smallest, but maybemy greatest contribution was adding the ability torearrange tabs in DevTools.

[APPLAUSE] It's probablythe greatest five lines I've ever written.

I did work on other features.

So if you find me afterwards,feel free to ask me about them.

I might share aDevTools trick or two.

More recently I've had thehumbling experience of building webcomponents.

Org and witnessingall the incredible components that all of you havebuilt and published.

For example, the one andonly Pokemon selector.

And if you are theperson who says, but there's only 151Pokemon in the original set, well, there's even an optionthat lets you set that, too.

So all kudos to Sami for this.

He was, however, in the processof building webcomponents.

Org, which brings us to what we'rehere to talk about today.

So first, I'm going tocover my story of how I came to encounter thisSEO problem while building webcomponents.

Org.

We'll then look at how I usedheadless Chrome to solve this before diving into all thedetails of how that actually works and how you can use it.

So I'm going to take astep back for a moment and talk about what I learnedin the process of building webcomponents.

Org.

The first thingI learned was how the platform supportsencapsulation through the useof web components.

With this encapsulationcomes inherent code reuse, which leads to aspecific architecture.

Also learned a lot aboutprogressive web apps and how they can provide uswith fast, engaging experiences.

I learned how theplatform provides APIs, such as serviceworkers, to help enable those experiences.

I also learned how tocompose web components to build a progressive web app.

We've heard from Kevin yesterdayabout the PRPL pattern– Push Render,Pre-cache, Lazy Load– as a method of optimizingdelivery of this application to the user.

And one of thearchitectures which enables us to utilize the PRPLpattern is the App Shell model.

It provides us with instantreliable performance by using an aggressivelycached App Shell.

You can see that of all therequests which hit our server, we serve the entrypoint file which we serve regardless of the route.

The client then requests theApp Shell, which is similar.

But because it's the sameURL across the application, we can combine thatwith a service worker to achieve near-instantloading on repeated visits.

The shell is thenresponsible for looking at the actual routethat was requested and then requests the necessaryresources to render that route.

So at this point I'd learnedhow to build a progressive web app using client-sidetechnologies like Web Components and Polymer,and how to use patterns such as the PRPL patternto deliver this application quickly to the user.

Then there's the elephantin the room, SEO.

For some of thesebots, they're basically just running curl with thatURL and stop right there.

No rendering, no JavaScript.

So what are we left with? With this PWA that we builtusing the App Shell model, we're left with just yourentry point file, which has no information in it at all.

And in fact, it's thesame generic entry point file that you serveacross your entire application.

So this is particularlyproblematic for Web Components which require JavaScript to beexecuted for them to be useful.

This issue applies toall search engine indexes that don't render JavaScript.

But it also applies to theplethora of link rendering bots out there.

There's the social botslike Facebook and Twitter, but don't forget the enormousnumber of link rendering bots such as Slack,Hangouts, Gmail, you name it.

So what is it aboutthe App Shell model that I'd really like to keep? Well, for me, this approachpushes our application complexity out to the client.

You can see that the serverhas no understanding of routes.

It just serves theentry point file, and it has no real understandingof what the user is actually trying to achieve.

This allows our server tobe significantly decoupled from the front end application,since it now only needs to expose a simple API toread and manipulate data.

The application that wepushed out to the client is then responsible forservicing this data to the user and mediating user interactionsto manipulate this data.

So I asked, can we keepthe simple architecture that we know and we love andalso solve this SEO use case with zero performance cost? So then we thought,what if we just use headless Chrome torender on our behalf? So here's a breakdownof how that would work.

We have our regular userswho are making a request, and they wouldlike a cat picture.

Because who wouldn't? And as part of this approachwe ask, are you a robot? And to answer this, we lookat the user agent string and check if it's a knownbot that doesn't render.

In this case, theuser can render, so we serve the pageas we normally would.

The server responds with afetch cat picture function, and then the client can goand execute that function to get the renderedresult.

By the way, this is one of my kittens,which I fostered recently.

She's super adorable.

Now when we encountera bot, we can look at the user-agentstring and determine that they don't render.

And instead of serving thatfetch cat picture function, we fire for a questto headless Chrome to render thispage on our behalf.

And then we send the serializedrendered response back to the bot so they can seethe full contents of the page.

So I built a proof ofconcept of this approach to webcomponents.

Org,and it worked.

I wrote a "Medium"post about it, and people were reallyinterested in this approach and wanted to see more of it.

So based on thisresponse, I eventually decided that insteadof my hacky solution that I would build it properly.

But then came themost challenging part of any project.

And I know you've allexperienced it as well.

Naming.

So I asked in our teamchat for some suggestions, and I got a ton.

[LAUGHTER] So these aresome of our top ones.

There's some greatones in there.

Power Renders, Use ThePlatform As A Renderer.

However, today I am verypleased to introduce Rendertron.

Let me render that for you.

[APPLAUSE] Rendertron is a Dockerizedheadless Chrome rendering solution.

So that's a mouthful,so let's break it down.

First off, what is Dockerand why did I use it? Well, no one knows what itmeans, but it's provocative.

In all seriousness,Docker containers allow you to create lightweightimages as standalone executable packages which isolatesoftware from its surrounding environment.

In Rendertron, we haveheadless Chrome packaged up in this container so that youcan easily clone and deploy these to wherever you like.

So what about headless Chrome? It was introduced in Chrome59 for Linux and Mac, Chrome 60 forWindows, and it allows Chrome to run in environmentswhich don't have a UI interface, such as a server.

This means that you cannow use Chrome as part of any part of your tool chain.

You can use it forautomated testing, you can use it formeasuring the performance of your application,generating PDFs, amongst many other things.

Headless Chrome itselfexposes a really basic JSON API for managing tabs, withmost of the power coming from the DevTools protocol.

All of DevTools is builton top of this protocol, so it's a pretty powerful API.

And one of the keyreasons that headless Chrome is great is that nowwe're bringing the latest and greatest from Chrome toensure that all the latest web platform features are supported.

With Rendertron, thismeans that your SEO can now be a first classenvironment which is no different fromthe rest of your users.

So just a quick shout-out.

This all sounds reallyinteresting to you, and you'd like toinclude headless Chrome in some otherway in your tool chain.

There's a brand-newnode library that was published just last weekthat exposes a high-level API to control Chrome whilealso bundling all of Chrome inside that node package.

So you can checkit out on gitHub at GoogleChrome/puppeteer.

So I've looked at the highlevel of how headless Chrome can fit into your applicationto fulfill your SEO needs.

Now it's time todive to how it works.

But I've been talking a lot.

So who wants to seeRendertron in action? [CHEERS] All right, so thisis the Hacker News PWA created by some ofmy awesome colleagues, and it's built usingPolymer and Web Components.

It loads really fast, and allaround performs pretty well.

We can see the separatenetwork requests which loads the main content that we see.

And we can guess that it'saffected by this SEO problem, since it uses Web Componentswhich require JavaScript, and it pulls indata asynchronously.

So one quick way to verifythis is by disabling JavaScript and refreshing the page.

And once we do that, we cansee that we still get the app header, since that wasin the initial request, but we lose the main contentof the page, which isn't good.

So we jump over to Rendertron,a headless Chrome service that is meant to renderand serialize this for you.

So I wrote this UI as aquick way to put in the URL and test the appletfrom Rendertron.

So first off, whatare we hoping to see? Because these bots onlyperform one request, we want to see that whole pagecome back in that one network request.

We also want tosee that it doesn't need any JavaScript to do this.

So take a look.

I'm going to put inthe Hacker News URL and tell Rendertron torender and serialize this, and that I'm alsousing Web Components.

And it renders correctly.

I'm going to disable JavaScriptand verify that it still works.

So you can see it's stillthere, and it all comes back in that single network request.

Rendertron automaticallydetects when your PWA has completed loading.

It looks at the page load eventand it shows that it has fired.

But we know that's a really poorindication of when the page is actually completed loading.

So Rendertron also ensuresthat any async work has been completed, and it alsolooks at your network requests to make sure they'refinished as well.

In total, you have a10-second rendering budget.

This doesn't mean that itwaits 10 seconds, though.

It'll finish as soon asyour rendering is complete.

If this is insufficientfor you, you can also fire acustom event which signals to Rendertron thatyour PWA has completed loading.

Serializing WebComponents is tricky because of Shadow DOM,which abstracts away part of the DOM tree.

So to keep things simple,Rendertron Shady DOM, which polyfills Shadow DOM.

This allows Rendertronto effectively serialize the DOM tree so that it canbe preserved in the output.

So let's take a look at thenews PWA, which we've all seen, and it's also built by someof our other colleagues.

And we'll plug thatinto Rendertron.

We'll then ask Rendertronto render this as well, and then I'm alsousing Web Components.

And there we have it.

So what do you need to doto enable this behavior? With Polymer 1this is super easy, and Rendertron doesn'tactually need to do anything.

Simply append domequals shady to the URLs that you pass toRendertron and Polymer 1 will ensure thatShady DOM is used.

With Polymer 2, andwith Web Components v1, it's recommended you use WebComponents loader.

Js which pulls in all the rightpolyfills on different browsers.

You then set aflag to Rendertron telling it that you'reusing web components, and it will ensure that thenecessary polyfills that it needs for serializationget enabled.

So another feature ofRendertron is that it lets you set HTTP status codes.

These status codes are used byindexes as important signals.

For example, if itcomes across a 404, it's not going tolink to that page because that would be a reallypoor search result.

Our server, though, is still returningthat entry point file with the status card of 200 OK.

So it looks likeevery URL exists.

Rendertron lets youconfigure that status code from within your PWA,which understands when a page is invalid.

Simply add meta tags– dynamically is fine–to signal to Rendertron what the status code should be.

Rendertron will then pick theseup and return that status code to the bot.

So this approach isn'tspecific to Polymer or even Web Components.

Let's plug infonts.

Google.

Com and see what happens when we serialize it.

So that looks pretty good.

Who can guess whatJavaScript library was used to build Google Fonts? Angular.

Rendertron works with any andall client-side technologies that work in Chrome and whoseDOM tree can be serialized.

The Rendertronendpoint also features screenshot capabilitiesso that you can check that headless Chromeand the load-detecting function are performing as you expect.

Unfortunately, thisservice is not fast.

For each URL that we render,we spin up headless Chrome to render that entire page.

So performance is strictly tiedto the performance of your PWA.

Rendertron does, however,implement a perfect cache.

This means that if we haverendered the same page within a certain cachefreshness threshold, we'll serve the cached responseinstead of re-rendering it again.

So how can you get yourhands on this today, and how do you use it? Well first, you'll need todeploy the Rendertron service to an endpoint.

You'll need to clonethe gitHub repo at GoogleChrome/rendertron.

And it's built primarilyfor Google Cloud, so it's easiest to deploy there.

But if you remember, thisis a Docker container.

So you can deploythis to anywhere which supports a Docker image.

So to make things simplefor you to test out, we have the demoservice endpoint, which you can hit atrender-tron.

Appspot.

Com.

And that's the one withthe UI that we saw earlier.

It is not intended to be usedas a production endpoint.

However, you arewelcome to use it, but we make noguarantees on uptime.

Having this as aready to use service is something thatwe might consider based on the interest received.

So just in caseyou're wondering, my boss's Twitterhandle is @mattsmcnulty, just in case you want totell him how awesome I am.

So once we havethat end point up, you're going to need toinstall some middleware in your application to do theuser-agent splitting that I was talking about earlier.

So this middleware needsto look at the user-agent, figure out whether ornot they can render, and if not, proxy the requestthrough the Rendertron end point.

If you're using prpl-server,which is a node server designed to serve productionapplications using PRPL, you simply need to specifythe bot proxy option and provide it with yourRendertron endpoint.

If you're using Express,there's a middleware that you can includedirectly by saying app.

Use rendertron-middlewarewith a proxy endpoint and whether or not you'reusing Web Components.

If you're not usingeither of these, check the docs for a listof community-maintained middleware.

There's a Firebasefunction there, as well as a list of existingmiddleware that Rendertron is compatible with.

If it's not listed,it's also fairly simple to roll your own middlewareby simply proxying based on the user-agent string.

And that's it.

That's all the changesyou need to make to use Rendertron today, andall these bots can now be happy.

Rendertron isavailable to use today, compatible with anyclient side technologies, including both Polymer1 and Polymer 2.

Thank you.