Client-Side Performance
How to Evaluate and Improve Client-Side Performance for the Web
What is in it for you?
What you might get out of this
- How to evaluate page performance
- What tools you can use and what they measure
- Performance models and meaning
- How to improve performance
- No realtime tuning exerise
Basic Terms and Rules
Usability, Perception, Interaction, and Performance
- Usability is the capacity of a system to provide a condition for its users to perform the tasks safely, effectively, and efficiently while enjoying the experience.
- Perception or perceived performance is a measure of how fast the site appears
- Interaction performance defines how quickly the page can be interacted with and how it reacts to user input
- Performance is a measure of how fast the site is
- Bad usability might lead to a low performance perception despite a good technical performance
- Bad performance lowers usability most of the time
- Bad interaction timing renders fast performance void
- Fast performance does not equal high perceived performance
Performance Models
The Two Popular Models
RAIL - User Perception Again
A more refined user perception timing model
0 to 16ms |
Users are exceptionally good at tracking motion, and they dislike it when animations aren't smooth. They perceive animations as smooth so long as 60 new frames are rendered every second. That's 16ms per frame, including the time it takes for the browser to paint the new frame to the screen, leaving an app about 10ms to produce a frame. |
0 to 100ms |
Respond to user actions within this time window and users feel like the result is immediate. Any longer, and the connection between action and reaction is broken. |
100 to 300ms |
Users experience a slight perceptible delay. |
300 to 1000ms |
Within this window, things feel part of a natural and continuous progression of tasks. For most users on the web, loading pages or changing views represents a task. |
1,000ms or more |
Beyond 1,000 milliseconds, users lose focus on the task they are performing. |
10,000ms or more |
Beyond 10,000 milliseconds, users are frustrated and are likely to abandon tasks. They may or may not come back later. |
RAIL
A user-centric performance model that breaks down the user's experience into key actions
- Response: Complete a transition initiated by user input within 100ms.
- Animation: Aim for visual smoothness. Produce each frame in an animation in 10ms or less.
- Idle: Maximize idle time to increase the odds that the page responds to user input within 100ms.
- Load: Deliver content and become interactive in under 5 seconds on mobile.
Basic RAIL Guidance
Response
- Respond to user input within 100ms, or else the connection between action and reaction is broken
- For actions that take longer than 100ms to complete, always provide feedback.
Animation
- Produce each frame in an animation in 10ms or less
- Animation is scrolling, dragging, loading indicators, entrance and exist, and fancy UI stuff (yes, mini cart slider for instance)
Idle
- Use time between user actions to prepare more
- Such as loading the rest of the page
Load
- When pages load slowly, user attention wanders, and users perceive the task as broken
- For first loads: load the page and be interactive in 5 seconds or less on mid-range mobile devices with slow 3G connections
- Subsequent loads in under 2 seconds
- On a powerful desktop machine over a fast Wi-Fi, users have grown accustomed to a 1 second loading experience
Web Vitals - The New Kid on the Block
Web Vitals is the new performance model by Google
- Consists of Core Web Vitals (three a the moment) and Web Vitals
- All offer a rating of Good, Needs Improvement, and Poor
- Will evolve over time
- A "good" rating for P75 of page loads is recommended
- Disadvantage: Only works for page loads
Web Vitals - Core
The essential three Core Web Vitals
- Largest Contentful Paint (LCP): measures loading performance. To provide a good user experience, LCP should occur within 2.5 seconds of when the page first starts loading.
- First Input Delay (FID): measures interactivity. To provide a good user experience, pages should have a FID of less than 100 milliseconds.
- Cumulative Layout Shift (CLS): measures visual stability. To provide a good user experience, pages should maintain a CLS of less than 0.1.
Largest Contentful Paint (LCP)
Replaced some metrics from the past
- Replaces First Meaningful Paint (FMP) and Speed Index (SI)
- Old metrics too complicated to understand
- LCP represents the rendering of the largest element
- Can be an image or a text block
- Can change aka cookie overlays on the homepage when loaded first
- LCP within first 2.5 s for P75 of loads
- Only VISIBLE parts of an element within the viewport
- Smallest size of the largest element (depends if resized or not)
- Element might change while the page loads
- No margin, padding, borders
Largest Contentful Paint (LCP) - Important
The element can be different for the same page
First Input Delay (FID)
It is all about user interaction responsiveness
- FID measures the time from when a user first interacts with a page (i.e. when they click a link, tap on a button, or use a custom, JavaScript-powered control) to the time when the browser is actually able to begin processing event handlers in response to that interaction.
- Less than 100 ms for P75 of page loads
- Delays happen because the main browser thread is still doing page loading
- FID is only the delay, not the actual event processing
- Only considers taps, clicks, and key press, NOT scrolling or zooming (the R of the RAIL model)
- Cannot be measured with normal tooling only with field testing
- Fallback metric: Total Blocking Time (TBT)
Total Blocking Time (TBT) as FID Backup
FID is not easily measurable, hence TBT is a good substitute
- The Total Blocking Time (TBT) metric measures the total amount of time between First Contentful Paint (FCP) and Time to Interactive (TTI) where the main thread was blocked for long enough to prevent input responsiveness.
- Main thread is block, when a task last more than 50 ms
- TBT should be less than 300 ms
- sum of all (task length - 50 ms)
- 200 + 40 + 105 = 345 ms
Cumulative Layout Shift (CLS)
Measure things that move unintentionally
- CLS measures the sum total of all individual layout shift scores for every unexpected layout shift that occurs during the entire lifespan of the page.
- A layout shift occurs any time a visible element changes its position from one rendered frame to the next.
- Layout shifts only occur when existing elements change their start position
- New elements appearing are not counted as long as no other element moves
- Recommended score < 0.1 for P75 of page loads
Cumulative Layout Shift (CLS) - Calculation
A small example with strong element shifts
- Element moves by 25%
- Union of old and new is the impact fraction = 75% (0.75)
- Moved distance is called distance fraction = 25% (0.25)
- Score = impact fraction * distance fraction
(0.75 * 0.25 = 0.1875)
- Only the viewport is considered, overflowing element regions are not considered
Cumulative Layout Shift (CLS) - Example
A small example with strong element shifts
Speed Index - How Does it Work
The speed index is very much mistaken
- A metric invented by Webpagetest.org in 2012
- Measures how fast the page content is visually displayed
- Based on Visual Progress from Video Capture
- Calculated by comparing the distance between the histogram of the current frame and the final frame
- There is a perceptual speed index version that uses SSIM (structural similarity index measure) between frames instead of the histogram
- Depends on the view port size
- Depends on what is displayed (changing content over time)
- Comparison rating by Google goes against HttpArchive data
- Requires video capturing
- Does not work well with video on the page or any other moving stuff
Speed Index - Calculation
A small example to understand it better
- Start loading to visual complete (0.0 - 1.0)
- 100 ms frames
- Each frame is compared to final frame using a histogram comparison
- Frame score = interval size ms * (1 - (completeness/100))
- Total score (ms) = sum(frame scores)
Lighthouse Scoring
Scoring is formula that changes over time
- These parameters go into the final value
- They are adjusted over time (v7 at the moment)
- Mobile and Desktop are different
Lighthouse - Mobile vs. Desktop
Why numbers changed and why numbers are different
Wrong Data
- Lighthouse prior v6 rated desktop using mobile data from HttpArchive
- v6 now uses also desktop scores aka everything desktop from before is void
Mobile Emulation
- Lighthouse runs mobile-first
- Uses a mobile device emulation
- Slower network, slower CPU (Lantern engine)
- Device emulation tries to be a Moto G4 360x640 px
- Chrome DevTools device emulation != Lighthouse emulation
Changed Numbers
- Lighthouse Extension was available locally before
- Lighthouse now uses remote servers instead
- You can use a Node CLI version instead or DevTools Lighthouse
- Lighthouse "remote" might not be the latest version
- Lighthouse CI to be independent
Lighthouse - Mobile vs. Desktop
First Contentful Paint (FCP)
Important for tuning LCP
- The First Contentful Paint (FCP) metric measures the time from when the page starts loading to when any part of the page's content is rendered on the screen.
- Less than 1 sec for P75 of page loads
Time to First Byte (TTFB)
Important for tuning FCP at all
- Defines when the first byte of the response is received by the client
- There are streaming servers and blocking servers
- Blocking server deliver the full response at once
- Streaming delivers data when the data is ready
- Most important first key metric
- If TTFB fails to deliver, no other metric can compensate
- Heavily defines when the user sees first feedback
- Feedback might just be the page turning white
- External solutions might do split page caching to speed up slow servers
And now the conclusion...
The Unfortunate Limited Use of All of That
- Classic websites are fine as long as they do page-loads
- XHR is not measurable without instrumentation, especially rendering is hard
- ElementTimingAPI* might help, but has limited reach (not all elements are covered and neither FID or resource loading are)
The PWA Challenge
- PWA is either a page load (landing) or resource and render load only
- Limited proper tooling available for subsequent interaction
- Test in two directions:
- First load for new and returning visitors
- First load for SEO
- Interaction navigation via video timing when perceived performance impression is poor
- Identify navigation path with large loads and screen changes
Measuring and Tuning
Finally Let's Measure and Investigate
Measuring Basics
Just some basic measurement information
- Your network determines the performance!
- Your PC makes a difference (load, memory, cpu...)
- Resolution makes a difference
- The site content make a huge difference
- Homeoffice means different network speed and providers
- Modern web caches like crazy, caching changes things
- A VPN changes things a lot, same for corporate firewalls
- Never trust one measurement
- Prefer incognito browsers for testing
- Measure more than once
- Be aware of caches or disable them if possible, that depends on the preferred target loading behavior of course
- Servers have cached state as well, be aware of that (database, page cache)
- When testing on production, you are not alone!
Different Navigation Types
When and how a page is loaded makes a difference
- First load: Never visited before site and page, caches are cold
- Repeated load: Reloading the same page
- Returned load: Returning via navigation to the same page
- Navigation load: Going to a different page but having visited another page before (same or different type), e.g. homepage -> product listing page
- First load: Not all caches can be disabled (CDN, DNS), hence disable the browser cache for testing
- Repeated load: Just use normal caching and reload
- Returned load: Can only be measured manually
- Navigation load: Can only be measured manually
- Returned load is hard to test because when testing, it is often just a repeated load (full cache reuse instead of partial reuse)
Cheap Advise First
Question everything first
- Measure your timing first
- Go for largest contentful paint (LCP) and first contentful paint (FCP), visual complete above the fold, and fully loaded (because progress bar prevents people from doing things)
- Optimize for slower connections on mobile
- Optimize for ok but not blazing fast connections on desktop
- Reevaluate third parties and features
- Understand the critical rendering path
- Users land not only on the homepage!
Use ChromeDev Tools - CDT
The Basics
CDT - Timeline
The first area to look at
- Red bars show blocking tasks
- Shows CPU resource utilization, such as violet means layouting, yellow JS, gray any task, blue HTML parsing, green compositing, or idle
- Blue bars show networking
- Screenshots show incremental rendering and content shifts
- Vertical lines are important events... more later
CDT - Timeline
What are all the components of the browser doing
- Listed by thread
- Listed by important components (experience, interaction, timing)
- Also lists subframes of third parties
- Shows the important events for the timing analysis (LCP, FCP, DCL)
CDT - Networking
What does the network do and when?
- Shows network downloads
- Includes priorities
- Download is networking and overhead (resource loading)
- Reflects nicely the order of loading and when it happens in relation to other activities
CDT - Events
Our main signals
- DCL - DomContentLoaded, HTML was loaded and processed
- FP - First Paint
- FCP - First Contentful Paint
- LCP - Largest Contentful Paint
- L - OnLoad, document fully loaded
CDT - Tasks
See what the browser does and what is too much
- Every red corner is a problem
- Tasks longer 50 ms blocking main thread
- Render reflows, aka layout changed by JS or dynamically loaded CSS or class changes in the tree
- Flamechart is great for debugging call stack and origins of problems
Size Matters
What you don't transport, you don't have to evaluate
- DOM: 1,500 nodes or less
- CSS: Only what you need on that page
- Images: Device specific, PNG/JPG/SVG carefully selected, meta data stripped, optimally compressed, CSS effects instead of images
- Responsive Images: Use
srcset
, size
, and picture
1
- Fonts: Less fonts, fonts limited to used glyphs
- JS: Minify and select what you need, try async or defer
- Transport: Compressed, HTTP/2
- Cache: Make things cacheable as long as possible
- Cookies: As little as possible
- Inline: Don't do inline stuff, it is not cacheable
Things that block loading and rendering
Don't block the parser
- Short and clean HTML to shorten parsing
- CSS in front of all
- No or little JS to avoid blocking parsing
- Little, small, and possible preloaded fonts
- Utilize Font Loading API
- Don't read or write styles with JS
- Provide WOFF2 and WOFF font formats
- Avoid optimizing for IE8 to 10, let it degrade gracefully, rather make it fast for 90% than right for 100%
- Don't block scrolling when loading more tiles
Relayouting
Avoid repainting what has been painted already
- Don't change the DOM
- Don't add late CSS
- Don't hide content with JS as part of a page load
- Don't resize content
- Avoid late fonts
- Don't let third parties add content, avoid flickering
- Announce image sizes or deliver progressive images for early size detection
Caching
You want recurring visitors
- Cache content as long as possible, let the browser deal with the expiration
- Cache in browser as public resource and permit caching by intermediaries
- Utilize fingerprints to publish new versions
base.css?665sdasdf
- Own your content, don't host externally for CSS, JS, fonts (exception apply)
- Expect users to start out of sync aka not on the homepage
- Everything can be a landing page and maybe business makes PDP a landing
- Don't expect things to be cached, because homepage loaded it
Above the Fold
What the user sees first is important
- Focus on what the users sees first
- Keep pages short
- Don't spent money on things only 5% see
- Above the fold is important to paint first
- Consider content removal below the fold
- Progressively enhance
Late Content
You need space for your late auntie
- If something comes in late, reserve the space for it
- Consider a place holder and enhance it
- If it flickers, it draws attention and distracts (one of the reason why header sliders are a bad idea)
Responsive Feedback
Let the user know that he waits
- RAIL: Response
- If it takes longer than 50ms, indicate it
- When the user clicks, give feedback
- Progress bars, spinners, blocking the area that refreshes
- Don't change the size!!!
- Test functionality, lack of responsiveness is easily felt
- Don't betray yourself: No feature is better than a sluggish feature
Third Parties
Untangle yourself from third parties
- Own content and CSS/JS/Fonts as much as possible
- Be in control of quality, caching, and more
- A third party can bring you down
- Third parties can be blocked (extensions, proxies)
- Third parties are a security risk
- Load async or deferred, get them out of the main renderer
- Don't let them change the layout!
Availability
- A third party guarantees 99.5% availability
- This is 43.8 hours of unavailability per year!
- If you use 10 third parties on your site, it becomes 95.1% availability aka 430h or almost 18 days of unpredictable behavior per year.
- P.S. 99.9% makes it 87.6h, still more than 3 days!
Conclusion
All in one Slide
Know the Stack
- Unblock the main render thread
- Don't change layout late
- Early feedback and above the fold rendering
- Progressively enhance
- Cut the third party cord
- Go light!
Questions And Answers
Your Questions and Feedback