urlscan.io Blog


Product Updates for 2021

This post will be a recap of new features we launched in 2021, covering our community platform and our commercial products. There will be a separate post with our 2022 product roadmap later.

Scanning Engine v2

As one of our biggest projects in 2021, we sat down and rewrote our scanning engine from scratch. The result was the Scanning Engine v2 which can be used for a multitude of purposes. The most important use-case is still the regular scanning of URLs submitted through urlscan.io without any visible changes to the user. Other use-cases include the Live Scanning feature and different internal scanning tasks that can now all be covered by the same codebase. These changes have helped us deploy our engine much more quickly with just a few lines of infrastructure definition. The new engine is also much more modular, allowing users to define whether to store data to backend storages for example. The scanner has a vast array of options now that can be changed at scan time and a modular architecture that allows us to run it with different backend modules and different connectivity options.

The new scanning engine allowed us to re-enable File Downloads on urlscan.io. Whenever a website triggers a file download, we will now capture the downloaded file, hash it with SHA256 and store it ourselves. Users on the urlscan Pro platform will be able to download the files from our archive. Using the files.sha256 modifier, users can search for scans which triggered file downloads. We run a full web-browser which means that we are even able to capture file-downloads using advanced evasion tactics like HTML Smuggling.

Because of the new scanning engine, we are able to run our scanner in more locations. As a result we introduced Country Selection for scans for all users of urlscan.io. If a user does not select a country explicitly, we have rules in place that try to determine the best scanner location for scanning a URL.

Live Scanning

With the new, modular and highly configurable Scanning Engine v2 we were able to quickly set up a large number of geographically distributed scanners. Geofencing and localisation are often issues when performing scans, so running scanners from different places across the globe will hopefully allow more websites to be scanned correctly. The name Live Scanning highlights the fact that the scans are faster and more lightweight because the scan results are not automatically stored to urlscan.io. By default, Live Scanning results only live in a temporary database for 60 minutes, unless the user explicitly requests a Live Scan to be stored permanently on urlscan.io.

Live Scanning is one of the building blocks we and our customers are able to use for different purposes. It allows to quickly iterate through geolocations and browser options to get a page to behave as expected. But it can also be used as an appliance for specific tasks where a full scan on urlscan.io would be unwarranted. Customers can use Live Scanning to quickly take a screenshot of a website, or to grab a file from a remote geographical location for example. With some additional logic, the Live Scanning API output can be used to monitor websites for changes.

Here is what Live Scanning allows you to do:

  • Run scans from one or more of 25 possible countries in Europe, Asia, and North- and South-America
  • Capture full-page screenshots of the website
  • Change the page capture timeouts to better capture “slow” websites
  • Enable or disabling captured artifacts, like the screenshot, DOM snapshot, or file downloads
  • Emulate common devices (iPhone, Android phones, etc), including the User Agent and device dimensions
  • Supply a customer HTTP User Agent and additional custom HTTP headers
  • Supply custom Accept-Language headers selection indicating language preferences
  • Use the automatic Banner Bypass feature to accept and hide typical cookie-consent banners
  • Scan pages on the TOR network (.onion)

Here is a screenshot of Live Scanning in action.

urlscan Pro

urlscan Pro received the most significant and visible updates and new features in 2021. Major updates included our new Live Scanning feature and Similarity Search UI and API. The details of the Similarity Search function are documented in the urlscan Pro portal.

We added hundreds of additional Brands to our brand monitoring system and made the brands easier to explore in the urlscan Pro portal. For each brand we show users helpful links to find potential additional brand-impersonation attempts.

We added a Video Walkthrough to the urlscan Pro frontpage to help new users get started. The walkthrough covers all major aspects of urlscan and is recommended viewing for new users.

To get a glimpse at the new features check out the dedicated page for urlscan Pro. We are happy to report that urlscan Pro is the most popular product, and the feedback we have received from customers has been very positive. Customers also used our urlscan Pro / Enterprise Survey to give structured feedback which will be incorporated into the roadmap going forward.

Products & Pricing

We simplified our Products lineup to be easier to grasp for potential customers. We now have two pure API plans on the lower end (Starter and Advanced) which are aimed at customers who are looking to automate URL analysis, e.g. via their SOAR tool. The two plans on the high end (Professional and Enterprise) are meant for larger customers who have more demanding automation needs and customers who are interested in using the urlscan Pro platform, its data and APIs to hunt for interesting scans and quickly pivot various attributes of existing scans to find the proverbial needle in the haystack.

Documentation

We added documentation pages for understanding every searchable field in our Search Index, and the semantics behind it. We also added a Result API Reference page.

Security

We added support for Two-Factor Authentication (2FA) based on TOTP to urlscan.io for all users. Customers on team accounts can check that all team members have 2FA enabled.

For customers on the urlscan Professional and urlscan Enterprise plans we implemented the option to login via Federated Authentication, often just called Single Sign-On (SSO). We support SSO via the SAML 2.0 protocol which works with all major identity providers such as Okta, Onelogin, PingIdentity, etc.

Community Platform

We made hundreds of small but impactful changes to our urlscan.io community platform. In addition to Country Selection and Two-Factor Authentication, we rewrote the Bulk Submit Page for logged-in users to offer more scan options and to back off if API quotas are exceeded.

The Result Page for scans was continuously improved to better show the raw data we capture and to highlight new data collected by our Scanning Engine v2, such as the geographical location the scan was performed from, and builtin pivots for links on the website.

When using a page hash and linking to the urlscan.io frontpage, everything in the hash will be pre-filled into the submission box, so all you have to do is to hit submit. This is how it works: https://urlscan.io/#https://google.com