ARCHIVED: Very happy with this tool, but I've stopped using it, especially because we have LLMs now.
π Extract and search the posts you've up-voted on StackOverflow. Look back on your data.
I need to quickly browse and re-learn from questions I've up-voted in the past. This is a browser extension and search UI for doing that. See Background for more information.
NOTE: This project was developed on macOS. It is for my own personal use.
The overall flow of the tool breaks down like this:
- Scrape your votes data from https://stackoverflow.com
- Expand the votes data into posts data using https://data.stackexchange.com
- View and search the posts
The application is made up of several distinct programs:
- A browser extension
- The code is in
src/
- The code is in
- A search frontend
- This is implemented as a React application with NextJS.
- The code is in
search-ui/
- A search backend
- Algolia is used as the search back-end
- For local development and experimentation, these is also a Lucene-based web server in
search-api/
. This acts as a substitute for Algolia.
The source code of the browser extension is generally grouped by the execution context that the code runs in and is inviting for future additions like Manifest V3 support, or a Safari browser extension.
util/
- Miscellaneous utility code that is not specific to the Look Back Tool.
src/
- The code in this directory is specific to the Look Back Tool.
src/web-page/
- The code in this directory runs on the web page.
src/backend/
- The code in this directory runs in the extension backend contexts: background workers, popups, and content scripts.
src/chromium-manifest-v2/
- Code that supports a Manifest V2 web extension developed for Chromium browsers.
src/firefox-manifest-v2/
- Code that supports a Manifest V2 web extension developed for Firefox.
There is one library dependency for the extension: https://github.com/dgroomes/browser-extension-framework. The BrowserExtensionFramework is an RPC-centric web extension framework that was originally developed as part of the Look Back Tool codebase.
The extension has been verified to work in the checked [x]
browsers:
- Firefox (version 91)
- Chrome (version 103)
- Opera (version 78)
- Edge
- Safari
In my opinion, content scripts are not compelling and I don't quite get their necessity in browser extension technical architecture. From my perspective there of course needs to be one isolated JavaScript execution environment that powers an extension. Why do there need to be two? The extension context has access to powerful browser APIs. The web page itself is a powerful execution environment because it has access to the DOM and the application source code. So what is the place of content scripts? I know by design, they have access to the DOM while the extension environment does not. But why? I'm sure there are good reasons. But the three different exec environments and their unique capabilities and restrictions has made it difficult to design and implement my own code.
A corollary to my bias against content scripts is my bias for Web APIs. Most of the source code for this extension actually executes on the web page, where standard Web APIs can be used. This code executes the domain logic like the data scraping and HTML generation. As such, this code is perfectly portable to other "evergreen" browsers because it just relies on standard web APIs instead of non-standard browser extension APIs (i.e. Manifest V2 and V3).
Follow these instructions to install the tool as a Chrome browser extension and use it:
- Install npm
- Clone the BrowserExtensionFramework (BEF) Git submodule:
-
git submodule update --init
-
- Build the BEF distribution
- Follow the build instructions in the BEF README. It is located at
browser-extension-framework/README.md
.
- Follow the build instructions in the BEF README. It is located at
- Install BEF:
-
npm install browser-extension-framework/framework/dgroomes-browser-extension-framework-0.1.0.tgz
-
- Run the Webpack build:
-
npm run build
-
- Build the extension distributions:
-
./build.sh
- This takes about a minute! I'm assuming the TypeScript type checking takes a lot of time.
-
- Open Chrome's extension settings page
- Enable developer mode
- Enable the Developer mode toggle control in the upper right corner of the page
- Install the extension
- Click the Load unpacked button
- In the file finder window that opens, find the extension distribution
directory
build/chromium-manifest-v2-web-extension/
, single click it to highlight it, and click the Select button. - It's installed!
- Open StackOverflow
- Go to https://stackoverflow.com/ in your browser
- Log in
- Open your profile
- Click your picture in the top right corner to open your profile
- Open the "Votes" tab
- Find the "Votes" tab and click it.
- For me, my Votes tab navigates to this URL: https://stackoverflow.com/users/1333713/david-groomes?tab=votes
- Scrape the votes data
- Open the extensions menu by pressing the puzzle icon in the top right of the window
- Alternatively, for Opera, it is a cube button
- Alternatively, for Firefox, there is NOT an extensions menu and instead you invoke the extension directly by clicking a puzzle icon button on the right side of the URL bar.
- Click the "stackoverflow-look-back" extension entry
- A popup will show up with buttons titled "Scrape votes" and "Expand posts". Click "Scrape votes" and check the console logs. The votes data will have been scraped and saved to browser storage.
- Open the extensions menu by pressing the puzzle icon in the top right of the window
- Expand the post data
- Go to the Stack Exchange Data Explorer
- If not logged in, then log in and navigate back to the original page.
- Repeat the earlier steps to open the extension entry
- The same popup will appear. Click "Expand posts". The post data will be expanded and saved into browser storage.
- Go to the Stack Exchange Data Explorer
- Download the posts data
- While on the same StackExchange page, repeat the earlier steps to open the extension entry
- Click the "View posts" button
- Click the download button. Now, you have a copy of the data in a JSON file.
- Upload to Algolia
- You're on your own for this step. Algolia is really easy to use.
- Search the posts
- Follow the instructions in
search-ui
to run the search UI. - Finally, search for that one post you up-voted that has the magic incantation of code that you urgently need!
- Follow the instructions in
The tool can also be installed as a web extension in Firefox! Follow these instructions to install it:
- Open Firefox to the debug page
- Open Firefox
- Paste and go to this URL: about:debugging#/runtime/this-firefox
- Load the plugin
- Click the button with the words Load Temporary Add-onβ¦
- In the file finder window that opens, find the file
build/firefox-manifest-v2-web-extension/manifest.json
and click Open - It's installed!
The extension can also run in Opera.
Follow these instructions to install it in Opera:
- Open Opera to the debug page:
- Open Opera
- Paste and go to this URL: opera:extensions
- Enable developer mode
- Toggle on the Developer mode control in the top right corner
- Load the plugin
- Click the "Load unpacked" button
- In the file finder window that opens, find the directory
src/extension/chromium-manifest-v2
and click Select - It's installed!
General clean ups, TODOs and things I wish to implement for this project:
- Support the Edge browser. Write a Powershell script to build the extension distributions. This is the Windows friendly thing to do. Add instructions as needed.
- Implement a "recents" feature? Maybe the most relevant StackOverflow posts are the ones I just added! I'm revisiting them continually until I understand them (concepts) or memorize them (commands or code snippets).
- DONE Replace the viewer stuff with a standalone "Search UI". I've already implemented a good deal of this effort in
search-ui/
. It is a single-page app built with Next.js and the posts data lives in Algolia. The UI uses Algolia's sophisticated component library. I'm impressed with the developer experience of Algolia (and I also notice the price tag $$$). - Handle case insensitivity in the search result highlighting. Unfortunately I this means the algorithm has to be changed considerably. Something to do with carrying a pair of "the original text section" and a "normalized (lowercased) text section" and somehow preserving case in the original phrase. Maybe regex are the right choice but then I have to escape the regexes if the matched term has regex special characters (which maybe they never do?).
- Facet search. Those are the clickable search categories you see in many search UIs.
- Show the question title.
- Give a visual indication that an entry is an answer to a question. Question entries, by contrast, will show unadorned.
- Fix redundant calls to
toJSON
. The whole purpose of that design is that it is called implicitly by the browserJSON.stringify
API. - Consider what to improve about indexing
htmlBody
. Unfortunately SEDE only provides the HTML body and doesn't have an option for just the content. That's perfectly reasonable. But it's awkward to index the markup. Doesn't it have any negative effect on search results? If not, then I can live with it. If I really wanted I could parse out the text nodes. - Support an incremental scrap and incremental load from SEDE, and even incremental download to file... dang that's kind of expensive. Maybe don't rush to do this.
- Get the answer's question title in the JSON data. Mostly this data is already there, but there are plenty of answers I've upvoted where I didn't upvote the question. So, this has an effect on the SEDE part too.
These are the finished items from the Wish List:
-
DONE Make an
entrypoint.js
file instead of re-using bothscrape-votes.js
andexpand-posts.js
independently -
DONE Get more re-use out of code. For example, re-use the Votes class between the scrape votes functionality and expand posts functionality
-
DONE Get post data for questions that were not up-voted but where there was an up-voted answer to that question. This is a common case. I thought it was rare because I assumed that when I upvote an answer that I would have already upvoted the question. But this isn't the case. I have a about two hundred of these cases. Also, even if I wanted to up-vote the question, some are actually locked! For example, one of the very first things I wanted to search for in my SO static data was for how to get the query parameters of the URL from JavaScript. But the question and answer didn't show up because I didn't upvote the question, only the answer, and it turns out the question itself is locked!
-
DONE Create a browser extension for this. The main benefit should be the removal of the manual steps like opening three different web pages and moving the downloaded files to different directories.
-
DONE (Update I think it's a race condition with the JavaScript doc load order) There are some occasional caching problems. Sometimes when I load a page, it saves "AppStorage" not define and stuff like that. I think it's a caching problem because when I "hard reload and empty caches" it works. But then later it might fail again although I haven't even changed the code so I don't understand how the cache could still be stale, and thus still be a problem. Not sure. But it's annoying.
-
DONE Create a Chrome Manifest v2 extension. This would enable making a Firefox extension, which is still on v2 but is working on supporting v3 sometime in 2022.
-
DONE Build a Firefox extension for the tool. For the most part, code can be re-used, but when it comes to the extension APIs themselves, there are significant differences. In fact, porting the extension to Firefox has been one of the most challenging software efforts I've done in recent years! In part, because I've been away from JavaScript dev for so long but also because the standardization of extension APIs is still a work-in-progress.
-
DONE Drop the Manifest V3 implementation. I originally implemented the Chrome extension using the Manifest V3 format for the simple reason that the Chrome getting started docs for extension development uses Manifest V3. This was my first web extension. Now that I've ported this to Firefox, I know much more about the extension landscape, especially the APIs. For example, Firefox is working on Manifest V3 support and it is a large effort which will take until early 2022 at the earliest. See this related blog post at blog.mozilla.org. Firefox will support Manifest V2 for at least another year. So that's early 2023 at the earliest. There is no value proposition for me to support a Manifest V3 version of the extension today when I can pay that implementation cost when the time comes that Manifest V2 support ends. The cost will almost definitely be lower then than now because of the inevitable enrichment of docs, StackOverflow posts, etc over time. So, drop the Manifest V3 support.
-
DONE Create an extension HTML page as an alternative to
generate-html.html
. This page will render the post data in a similar way but it will stop short of the downloading step. This page is meant to be used as an ephemeral view. Why? This is mostly just convenient so that I don't have to download the generated HTML and open it in a new tab over and over again while iterating on the UI.- DONE Create a browser action to open the "generate-html.html" page
- DONE (only implemented "Scrape" and "Expand") Because web extensions are only allowed one UI control in the
browser, we can't just add a new button to implement this feature. Instead, we need to extend
the
execute.html
page and remove its "automatic action detection based on URL" logic and replace it with explicitly "Scrape Votes", "Expand Post Data", "View", and "Download" buttons. This was actually the original implementation a long while back so I can copy from the original code.
- DONE (only implemented "Scrape" and "Expand") Because web extensions are only allowed one UI control in the
browser, we can't just add a new button to implement this feature. Instead, we need to extend
the
- ABANDONED (Something strange is up with the extension styles, there's some injected CSS I don't know where it's coming from) Fix the styles
- ABANDONED (Chrome only allows either a browser action or page actions, but not both. Oh well. I've figured out I can just bookmark the extension HTML page which works great.). Allow the extension to show the "View posts" button from any page. This should be a "browser action" instead of a "page action" (I'm so glad I dropped the Manifest V3 support because then I'd have to solve for the unified actions way too).
- DONE Create a browser action to open the "generate-html.html" page
-
DONE (Although this is a memory hog) Fix the CSS grid problem
-
DONE Known issue: The visual elements in the page break after the 1500th post in Chrome. I think this is because of an internal limit on CSS Grid sizes. See the note in the CSS Grid w3 standards page. It mentions 1500, and 3000 and when I go to exactly 1501 posts (there will be 2 * 1501 = 3002) the last post doesn't get rendered correctly. I think that's the limit. This issue does not happen Safari.
-
DONE (implemented for only a single search term) Consider creating a search bar where multiple terms can be search at once. Originally, I was hoping
Cmd + F
would be good enough for search but when the search term is SQL or bash, a lot of results come up and it's useful to add a second search term to reduce the result. This would add quite a bit of code to the page though. -
Include tags data. This would enable the ability to search by tags too.
-
SKIPPED Consider using modules, but also consider to NOT use modules. Modules are modern, but modules aren't exported in the global context therefore we forego the usual luxury of "executing code ad-hoc on the console to our delight". This is kind of a major bummer. Also modules can't be imported in web workers in Safari and Firefox so that is also a bummer when considering converting this tool to a browser extension.
- This was SKIPPED because even the official Chrome and Firefox repositories of example extensions do not use
modules. I am following by their "lead by example". See:
- https://github.com/GoogleChrome/chrome-extensions-samples. Only the "apps" examples use modules but Chrome Apps aren't extension. Chrome Apps are deprecated.
- https://github.com/mdn/webextensions-examples
- This was SKIPPED because even the official Chrome and Firefox repositories of example extensions do not use
modules. I am following by their "lead by example". See:
-
DONE Use info and debug log levels. I think Firefox and Chrome now have good filtering for that in the dev console so it's pretty useful
-
DONE Remove the automatic trigger of opening the
generate-html.html
page after the post data is expanded and instead go to an only on-demand trigger for this, a la the "View posts" button. This is symmetric to the way we trigger " Scrape votes" and "Expand posts". This is useful for a technical constraint: it's hard to implement a request-request-response-response system when it comes to: 1) trigger "Expand posts" from the extension to thecontent-script-messaging-proxy.js
2) forward the "Expand posts" trigger to the web page 3) execute and wait for the response fromPostExpander.expandPosts
and return the response to the content script and finally 4) the content script returns the response to the extension -
DONE Solidify on a "Posts viewer" name for the
generate-html.html
(do all the code renaming) and create a "download" option as a button on this page. -
DONE Consider adding RPC from the extension to the web page. Currently there is only the other way where the extension background script is the RPC server and the web page is the RPC client. But the other way would create a needed communication channel. Currently, the way that the extension communicates commands to the web page is an awkward "load another tiny script on the page" strategy. The many little content scripts and web scripts added to handle the dispatch of the "scrape votes" or "expand posts" command is verbose. They include:
- (DONE Converted to RPC)
content-script-scrape-votes.js
- (DONE Converted to RPC)
content-script-expand-posts.js
- (DONE Converted to RPC)
web-scrape-votes.js
- (DONE Converted to RPC)
web-expand-posts.js
They could all go removed and replaced with an RPC server (listener) that listens for the "scrape votes" or "expand posts" command from the extension background script.
-
DONE First, start by defining an
RpcServer
interface class and aBackgroundScriptRpcServer
class. Use theBackgroundScriptRpcServer
ininit-common.js
. -
DONE Next, define a server on the front-end and a client in the background. This is a bit abstract so I need to gather my thoughts. Consider the direction-specific messaging channels that already exist:
- From web page to background scripts (Chrome;
ChromiumRpcClient.js
ChromiumBackgroundScriptRpcServer.js
) - From web page to content scripts (Firefox;
FirefoxRpcClient.js
tocontent-script-messaging-proxy.js
) - From content script to background (Firefox;
content-script-messaging-proxy.js
toFirefoxBackgroundScriptRpcServer.js
)
The stumbling block that I'll run into when developing a "background to front-end communication channel" is I think the only way to "listen" for messages from the web page is via a
window.addEventListener
listener. Chrome's extension APIs allow a web page to send messages to the extension messaging system viachrome.runtime.sendMessage
but I don't think there is a similar API to listen for messages. Instead we must resort to listening to the window object. And this design requires that we have a messaging component in a content script because content scripts have have access to the window while the background scripts do not. Long story short, we need to incorporatecontent-script-messaging-proxy.js
into our Chromium design (before, it was just for Firefox) and then extendcontent-script-messaging-proxy.js
to handle both directions. It should transfer messages from the web page to the background scripts and it should do the reverse: transfer messages from the background scripts to the web page.- DONE Prototype a "server to front-end" RPC for Firefox. Why Firefox? Because it already incorporates the
content-script-messaging-proxy.js
so it will be easier. And if the prototype works, there's a much clearer path for a general implementation and/or a Chromium implementation.
- From web page to background scripts (Chrome;
- (DONE Converted to RPC)
-
DONE Standardize on RPC class naming convention.
- For clients, the name should follow: 1) BrowserDescriptor 2) SourceDescription 3) DestinationDescriptor 4) " RpcClient"
- For servers, the name should follow: 1) BrowserDescriptor 2) DestinationDescriptor 3) "RpcServer". The class comments should follow the same order.
-
DONE Consider turning
content-script-messaging-proxy.js
into a specific component of the RPC system. The genericness of it is becoming more confusing I think. This work will include baking in the "procedure target RPC" in the RpcClient and RpcServer classes and also handling it in the content script proxy. -
DONE Consider how to move the generic RPC code in
extension-entrypoint.js
and the generic RPC code inweb-load-source.js
into thesrc/rpc/
directory. Ideally, all generic RPC code should live separately from the other code. It should be such that the RPC framework is good enough to use by even another project! -
DONE Get rid of the symlinks. It doesn't work on Windows. I think I need a build script, like the Firefox build script. It be should be pretty easy to make a Windows bat script or maybe Powershell.
-
DONE Embed the "browserDescriptor" into the RPC Framework so that it may use it to instantiate the correct concrete sub-classes of RpcServer and RpcClient. Because there are multiple contexts (background, popup, content script, and web page), I think its useful to save the browserDescriptor in storage.
- DONE Create an
rpc-background-init.js
file. This should have a function to take the browserDescriptor as a parameter and save it to storage with some name like "rpc-browser-descriptor". The "rpc-" prefix should be used as a convention to make it clear that this property is owned and operated by the RPC framework and not by the app code. There should be another function to instantiate theBackgroundToContentScriptRpcClient
. This would be a " factory" function. I assume there will be a Chromium-specific and Firefox-specific versions of this client in the near future.- DONE Create an
rpc-storage.js
file that has functions to get and save the browserDescriptor
- DONE Create an
- DONE Create an
-
DONE Send a response from the web page RPC server to the popup client. With this feature, it enables the popup to give feedback in the UI, like "Scraping..." and "120 votes scraped so far...". There won't be as much a need to open the dev tools anymore to verify if it the tool is working or not.
- DONE Implement for Chrome.
- DONE Implement for Firefox
-
DONE There is no need to fetch the votes page limit from the web page. It can be passed as an argument of the remote procedure call from the background.
-
DONE Clean up the References. Organize MDN links together.
-
DONE Remove the 'votesPageLimit' from storage and instead use an input box in the extension popup. The storage is not worth the code complexity. Plus the feature is not even really useful. Might as well remove the code and make the limit even more obvious by putting it right next to the "Scrape votes" button. This removes the discovery problem for that config.
-
DONE Tags. Add question tags to the data and to the UI. Sometimes, a question does not actually contain the relevant concept. For example, a question like "How to get the current time in seconds" with the tag "JavaScript" would not show up if you search "JavaScript", but I want it to show up.
- DONE. Get a working SQL query that returns tags. Is it an array type in SQL?
- DONE Update the sede.ddl
- DONE Update the SQL query. Update the Post type. Persist the data. Query back the data.
- DONE Visualize the tags in the UI
- DONE on tags.
-
ABANDONED (Possible, but not feasible) Fix static download. It doesn't include the JavaScript code. The search doesn't work.
- ABANDONED (Abandoned because Chrome extension by default do not allow any inline
<script>
tags for security. See this answer) Yikes, this is a bit involved. There's a fundamental issue which is that you can't just extract the contents of the<script src="...">
tags and paste it into the page as an inline<script>
tag. You basically do this with CSS which is awesome, but it won't work the same for JavaScript as explained here because of the same origin policy. I don't really want to do the technique described in the linked StackOverflow answer. How can I get what I want and not introduce too much complexity (or even reduce complexity)... I think I can inline the contentsposts.viewer.js
,PostsViewer.js
andposts-viewer.css
intoposts-viewer.html
. In other words, get rid of those files and just useposts-viewer.html
. This way,posts-viewer.html
is already much closer to the "Download-ready format" we need to support the download button. Nice. - Re-download the external source and splice it into the page. This is the complicated solution that we must do because of the restriction described in the earlier item.
- Delete the
<script src="...">
tags. These should not be included in the download. The downloaded file has to be completely static, no external dependencies can be downloaded at runtime.
- ABANDONED (Abandoned because Chrome extension by default do not allow any inline
-
DONE Bundle JavaScript source code with Deno. Deno let's us write TypeScript!
- What is the first minimal step in incorporating Deno? I think we want to use Deno's
bundle
command to create a bundled entrypoint JavaScript. But on the other hand, I've discovered that it's inconvenient in general to use modules in a browser extension context. So, I'm not sure... Can theinit.js
file be bundled?- Update: we want to use ES modules for authoring code but not at runtime because of the aforementioned awkwardness
of the support for modules in a browser extension context. Deno's bundle let's us concatenate the content of JS
files that use
import
/export
into a "bundles" file that does not includeimport
/export
. Perfect.
- Update: we want to use ES modules for authoring code but not at runtime because of the aforementioned awkwardness
of the support for modules in a browser extension context. Deno's bundle let's us concatenate the content of JS
files that use
- DONE One-by-one modularize the files marked as accessible in the
manifest.json
file. Only entrypoint-type files should exist by the end, likeinit.js
,popup.js
andposts-viewer.js
.- DONE Modularize
rpc-web-page.js
- DONE Modularize
rpc.js
- DONE Modularize
jquery-proxy.js
- DONE Modularize everything
- DONE Modularize
- DONE fix modularization. The
Vote
class is getting double declared. I need to bundleweb-load-source.js
into the other web entrypoint files likeposts-viewer.js
- DONE Convert something to TypeScript
- DONE convert more things to TypeScript
- DONE
posts-viewer.js
- DONE
content-script-load-source.js
- DONE
popup.js
- DONE
- ANSWERED How do source maps work with TypeScript/Deno? Can I still productively debug my code in Chrome Dev Tools?.
Answer:
deno bundle ...
does not support sourcemaps but it is an open issue with a show of support from the Deno core team.
- What is the first minimal step in incorporating Deno? I think we want to use Deno's
-
DONE Fix the sort order of Q&As in the viewer. I'm seeing questions all bunched together and then answers bunched together right afterwards. Questions should always be following by their answers, but this isn't happening. For example, this answer is not following its question.
- (Answer: yes the "questionId" is a non-normal field and needed be included in the toJSON) Is there a defect where the question ID field is null on answers? For example, answer 37943159 has a null question ID. Why? This is a problem for the sort order.
-
DONE Change the project name. Drop the "static" name and replace it with "extractor", or "viewer" or something like that.
-
DONE Defect. If you click the extension button more than once, it is problematic because it runs the content scripts every time, which mean multiple window listeners are added because of
content-script-messaging-proxy.js
.- DONE When the popup is opened multiple times, the content scripts must skip the "load source" and "initialize RPC
proxy"
work. Use a flag on the
window
to keep track of the state. - DONE There is some other issue where if you execute "Scrape votes" multiple times, it just grows. Some old objects stay around. So when you execute it a second time, it kicks off two scrapers. And when you execute a third time, it kicks off three!
- DONE When the popup is opened multiple times, the content scripts must skip the "load source" and "initialize RPC
proxy"
work. Use a flag on the
-
DONE Convert everything to TypeScript
- DONE Convert the
init.js
files to TypeScript - DONE Convert
rpc-backend.js
to TypeScript - DONE Convert all of the RPC framework to TypeScript
- DONE Convert web page stuff to TypeScript
- DONE Convert the
-
OBSOLETE (now that TypeScript is in the picture, it is a strong counter force to this problem) This project has ballooned and I could really use some ESLint or something to do the undifferentiated heavy lifting of finding basic problems. For example, I changed the signature of the RPC client, and it's pretty easy to miss a call site and update the args.
-
DONE Modularize the source code layout. I want the
rpc/
code far away from the other code, so it's clear that it is a standalone component. Similarly, I want acore/
component which is the core of the SO Look Back Tool and it should be far away from the vendor-specific code (stuff like web extension IDs and manifests) -
DONE (It's not perfect, it should exit earlier. But I don't want to deal with Bash traps/catch yet) Fix the
build.sh
script to not exit when TypeScript compilation fails when the--watch
option is used -
DONE Clean up the relationship between
web-load-source.ts
,posts-viewer.ts
,web-injected.ts
andcontent-script-bootstrapper.ts
.- This work depends on the completion of the
web-extension-framework/
.
- This work depends on the completion of the
-
DONE implement the
web-extension-framework
- DONE incorporate the
rpc-framework
into theweb-extension-framework
- Note: I am not consistent with the way I separate or fail to separate "this is for the web page" with "this is for a popup script". Sometimes I say, "this is for the web page and nothing else", but really it can be for a popup script too because a popup script has its own web page (sort of... it has a page-like thing...).
- DONE incorporate the
-
DONE Create a
BackendWiring
abstraction similar toPageWiring
-
DONE The web-extension-framework and rpc-framework should be migrated to their own repo. I will be very happy when I can remove all of that code from this repo and focus again on the Look Back Tool features!
- DONE Add code to a new repo: https://github.com/dgroomes/web-extension-framework
- DONE Delete the now redundant framework code
- DONE Depend on the new code as a Git sub-module
-
DONE Defect. The stackoverflow-look-back is not working. There have been changes to the site.
-
DONE Replace Deno with Webpack and ts-loader. Similar to the work I did in the BrowserExtensionFramework.
- This is going to be at least a decent amount of work. It could be full of pitfalls. I at least proved out the BrowserExtensionFramework on Webpack and NPM and even validated it with the sample extension named Detect Code Libraries in that same project. How can I split this work into multiple, completable, tasks?
- DONE (this was easy for the only reason that we're consuming BrowserExtensionFramework was a Git submodule and that it has no NPM dependencies of its own. The effect of this is that we have full access to the source that we need to change (convert the import statements)) Update the Git submodule, build it, and then can I consume it from Deno? It might be possible if I turn off type validation... but I think the imports just won't work.
- DONE Migrate to Webpack and ts-loader, and use the latest BEF. Build BEF with
npm pack
and reference it fromstackoverflow-look-back
as an NPM dependency likefile:browser-extension-framework/browser-extension-framework/browser-extension-framework-0.1.0.tgz
-
OBSOLETE (The
search-ui
supportes this) Multi-term search. The search bar should take each word and apply an "AND" search -
DONE (Done. Wow Algolia made that easier than I could have imagined.) Exclude the
type
field from being searched. It doesn't matter much, but it's confusing to see it as a highlighted result. UPDATE: Algolia calls these "searchable attributes".
Here is some background on this project and some of my research which contextualizes the "why" of this project.
- Does StackOverflow already support this? stackoverflow.com does not have search functionality for posts
that you've up-voted. By contrast, there is a way to search for posts that you've bookmarked (nΓ©e favorited) using
the search option
inbookmarks:mine
. See the search page https://stackoverflow.com/search for all search options. I've bookmarked 121 posts whereas I've up-voted 2,200 posts! I want search coverage on my votes ( Hello StackOverflow, if you see this, consider this a feature request, or at least, a user experience data point! Thank you). Here are some related questions by other people: - Why scrape the HTML for this data and not just query it via the Stack Exchange Data Explorer (SEDE)? Unfortunately, up-vote and down-vote data is private. It is anonymized in SEDE. The StackOverflow API also does not expose this data. So, it must be scraped from the HTML.
- This is a fun project for me
- I like JavaScript and the browser
- Why do I like the browser so much? Among other things, the MDN Web Docs are so amazing π€©βοΈ and make it fun and rewarding to develop using Web APIs.
- This is a vehicle for me to learn TypeScript on a non-trivial project. I'm learning TypeScript with the help of Deno
and its
bundle
command.
- The Chrome extension development experience is overall pretty good. I imagine it's much better than it was in the
early years of Chrome. That said, it's difficult to debug the JavaScript code that runs in a service worker (the one
defined by the
background.service_worker
field in the manifest. I find that 1) When it errors, there are no logs but just the infamous "Service worker registration failed" message in the "chrome://extensions" page and 2) I can't attach a debugger. The only thing I can do is comment out the whole file, and uncomment lines little by little and addingconsole.log
statements. - How many execution contexts are there? 1) The JavaScript execution environment in the page 2) The JavaScript execution environment that executes the extension code like the popups and 3) The JavaScrip execution environment that runs the content scripts? For example, I need to understand this because I'm hitting a roadblock where I want to make a Proxy over jQuery on the webpage, but a content script's execution environment doesn't have access to the web page's variables, but it does have access to the DOM (seems arbitrary to allow one but block the other, but there is probably a good reason). And there is a way to work around this problem anyway: inject a script element into the page itself from a content script. See this StackOverflow question and answer.
- The
let that = this
trick I have to use in the ES6 classes is a bit disappointing... how else could this code be designed? Is there an idiomatic ES6 class way? Or this a quirk of classes? Answer: no, see this SO question. Update 2: well, in all cases arrow functions actually solve my problem (not sure if that's a good thing but I'll take it)! - One of the significant changes of Chrome's Manifest V3 over Manifest V2 is the Action API unification
- I'm not sure how to do global state anymore since I've incorporated modules. In a browser extension context especially,
a content script might be loaded multiple times, a web page script might be loaded multiple times and it's important
for the subsequent loads to not have a negative effect. For example, the first load might initialize an listener
object, and subsequent loads must not initialize a new listener object because then it leads to "double listens" and
other unintended side effects. Plus I'm confused how to declare global variables in TypeScript. I should stick the to
the
window
right?
Materials I referenced when building this tool and deep diving on learning.
- MDN Web docs: API docs for NodeList
- MDN Web docs: API docs for MutationObserver
- MDN Web docs: JavaScript modules
- MDN Web docs: toJSON() behavior
- MDN Web docs: "page_action"
- Note that the Manifest property
show_matches
(ofpage_actions
) is only supported in Firefox. By default, page actions are hidden in Firefox but by contrast, page actions are shown by default in other browsers. This was a surprising find to me because I couldn't see the page action in the URL bar and I was confused! I need to explicitly enable it with theshow_matches
property.
- Note that the Manifest property
- MDN Web Docs: Manifest property "externally_connectable"
- The
externally_connectable
is not supported in Firefox. An alternative must be used for message passing between the web page and the extension. See https://github.com/mdn/webextensions-examples/tree/master/page-to-extension-messaging.
- The
- MDN Web Docs: the EventTarget APIs
- MDN Web Docs: Window postMessage API
- MDN Web Docs: runtime.sendMessage()
- MDN Web Docs: browserAction.onClicked
- MDN Web Docs: tabs.sendMessage()
- Send messages from background scripts to content scripts
- Chrome equivalent
- MDN Web Docs: extension storage API
-
Enables extensions to store and retrieve data, and listen for changes to stored items.
-
- Chrome extensions docs
- Chrome extension docs: chrome.webRequest
- Consider using this API to intercept requests instead of using a Proxy object on the web page
- Chrome extension docs: Manifest V2 Getting started
- Chrome extension docs: chrome.browserAction
- Meta Stack Exchange: Database schema for the Stack Exchange Data Explorer (SEDE)
- StackExchange: What are tags, and how should I use them?
- This describes the tag naming convention. E.g.
command-line
,powershell
- This describes the tag naming convention. E.g.
- Multiple references on recommended/possible ways to render HTML dynamically from JS code in the browser (there are many but there is not an obvious choice!)
dgroomes/web-playground/browser-extensions
- My own reference project for Chrome extensions
- Extension Workshop: Porting a Google Chrome extension
- Shoot, Firefox doesn't support Manifest v3 and I spent all this time writing a Chrome extension in Manifest v3. I wish I had implemented in Manifest v2 so that I could compatibility with Firefox.
- Extension Workshop
- A special Firefox site that is focused entirely on extension development.
-
Get help creating and publishing Firefox add-ons that make browsing smarter, safer, and faster.
- Bugzilla (Firefox bug tracker)
- You can't use symlinks in web extensions. This works in Chrome, so this type of issue wasn't on my radar and I spent a lot of time trying to track this issue down. I wonder if symlinks might work in Firefox Development version? Update: no, it is the same in Firefox Developer edition.
- GitHub repo: mozilla/web-ext
- I'm purposely choosing to not use this tool. I want to keep the dependencies to an absolute minimum and this tool is not critical.
- Opera dev docs: The Basics of Making an Extension
- Deno: "A modern runtime for JavaScript and TypeScript."