Webscraping / automation

Hi,
has anyone got any pointers to using Nyxt for website automation or webscraping projects?

e.g. a simple example of what I’d like to automate:

  1. open https://octordle.com/

  2. click on ‘daily octordle’

  3. type in my first guess:
    ‘Fjord’

is such a thing achievable with some lisp magic? (I know emacs lisp well, and am itching for a project to try out common lisp).

1 Like

Thanks for your interest and sorry for a late reply! I’ve been thinking on how to show up all the features in a single piece of code and give pointers to other useful stuff. Here is the code, just put it in a separate file (say, ~/.config/nyxt/script.lisp):

(in-package #:nyxt-user)

;; Nyxt shows a history restore prompt by default, you don't want it to.
(define-configuration browser
  ((session-restore-prompt :never-restore)))

;; `*after-startup-hook*' is a hook firing when Nyxt is properly
;; started and is ready for interaction/scripting.
;;
;; `on' is a macro to bind an action to a hook.
;; - The first argument is the hook to bind the action to.
;; - The second argument is a name (if a sole one) or a list of names
;;   to bind hook argument to.
(on *after-startup-hook* ()
  ;; The body of the hook handler/action.
  ;; We simply load the URL and set some more hooks there.
  (buffer-load "https://octordle.com/?mode=daily")
  ;; `once-on' is a small macro (akin to `on') that makes it easy to
  ;; bind one-shot actions to hooks. It only invokes the action
  ;; once, and then deletes it.
  (once-on
      ;; `buffer-loaded-hook' is the most useful hook for you to do
      ;; actions on a page that just finished loading. Other useful
      ;; hooks are:
      ;; - `request-resource-hook' to redirect/block the requests.
      ;; - `prompt-buffer-ready-hook' to fill `prompt-buffer' with
      ;;   input or select some suggestions (see
      ;;   /tests/renderer-offline/set-url.lisp for an example
      ;;   usage).
      ;; - `pre-request-hook' that runs before any
      ;; - `request-resource-hook' handlers and allows to toggle
      ;;   modes before they start tinkering with requests.
      ;; - `enable-mode-hook' and `disable-mode-hook' to watch mode
      ;;   toggling.
      ;; - `window-set-buffer-hook' for when you switch buffers.
      (buffer-loaded-hook (current-buffer)) (buffer)
    ;; It's sometimes necessary to sleep, as `buffer-loaded-hook'
    ;; fires when the page is loaded, which does not mean that all the
    ;; resources and scripts are done loading yet. Give it some time
    ;; there.
    (sleep 0.5)
    (echo "Loaded daily puzzle, trying 'fjord'.")
    ;; Defining a local function for brevity. It's all plain Common
    ;; Lisp, so you can use macros and whatever language/library
    ;; constructs in your hook handlers.
    (flet ((select-click (selector)
             ;; nyxt/dom is a useful library with element classes (like
             ;; `nyxt/dom:h1-element') and different useful functions
             ;; like `nyxt/dom:click-element'. Try "C-space
             ;; describe-function nyxt/dom:" to find all the other
             ;; actions and classes there are.
             (nyxt/dom:click-element
              :nyxt-identifier
              (get-nyxt-id
               ;; `document-model' is a Lispy structure (`plump:root',
               ;; namely) mirroring the loaded page.
               ;;
               ;; `clss:select' behaves much like JavaScript's
               ;; querySelectorAll method, choosing all the nodes,
               ;; matching the selector and returning them as a
               ;; vector (thus the elt business.).
               (elt (clss:select selector (document-model buffer)) 0)))))
      ;; Using key element IDs for ease of referencing.
      ;;
      ;; You can learn about what element there are using the WebKit
      ;; inspector, just do "C-space open-inspector".
      (dolist (key '("#f" "#j" "#o" "#r" "#d" "#enter2"))
        (select-click key)))
    (echo "Typed in the word")
    (loop for box in '("#box1\\,1\\,1\\," "#box1\\,2\\,1\\," "#box1\\,3\\,1\\," "#box1\\,4\\,1\\," "#box1\\,5\\,1\\,")
          ;; `peval' allows you to evaluate any Parenscript (Lispy
          ;; syntax for JavaScript) in the current buffer.
          ;;
          ;; `nyxt/ps' is our extensions and helpers for
          ;; Parenscript. Look at source/parenscript-macro.lisp in
          ;; Nyxt sources to see what helpers there are.
          collect (peval (ps:chain (nyxt/ps:qs document (ps:lisp box)) style background-color))
            into guesses
          finally (echo "The results are: ~a" guesses))
    ;; It's a good tone to `nyxt:quit' after you're done, but if you
    ;; use nyxt --no-socket, you don't have to kill any instance :)
    (nyxt:quit)))

and then run this file with Nyxt as

# --headless is for headless mode with no GUI. You can omit it to see how things are automating.
# --no-socket allows you to run several instances of Nyxt at once. Can be useful, but generally is confusing, so I recommend omitting it.
# --config is the location of a new configuration file -- our script.
nyxt --headless --no-socket --config ~/.config/nyxt/script.lisp

Explanation

Nyxt recently got --headless mode, which allows it to run without a GUI. The --config-file is used as a script to automate Nyxt with in such a scenario. You can put any Lisp code into this file, but generally the most useful code for web scraping is hook binding and page interaction. The comments in the code spinnet above should give you directions over which facilities Nyxt has for hooks and page interaction.

Now, the workflow for automation that I usually use (and that you’re most probably even more aware of) is:

  • Open the page to scrape or interact with.
  • Try interacting with it in the desireable way, see what changes. Use open-inspector command and Inspect element context menu option, if necessary.
  • Understand the sequence of action that you desire. I.e. page loading -> button click -> style fetching, as in the script above.
    • If the events are network and page-level, those are quire likely hooks on the Nyxt side. Search for them in Nyxt with C-Space describe-any hook. See the list of most useful hook in the script above. Use on and once-on to attach handlers to those hooks, or look at the manual for the more verbose yet customizable syntax.
    • If you want to perform some action or fetch some information from the page on the page, then use peval, nyxt/dom methods, or ffi-buffer-evaluate-javascript function (which actually underlies the former two).

I hope that’s enough to get you started :slight_smile:

5 Likes

Thank you very much. There is indeed a lot for me to work with, appreciate it!