Last Update 6:51 AM November 20, 2024 (UTC)

Identity Blog Catcher

Brought to you by Identity Woman and Infominer.
Support this collaboration on Patreon!!!

Wednesday, 20. November 2024

Simon Willison

Foursquare Open Source Places: A new foundational dataset for the geospatial community

Foursquare Open Source Places: A new foundational dataset for the geospatial community I did not expect this! [...] we are announcing today the general availability of a foundational open data set, Foursquare Open Source Places ("FSQ OS Places"). This base layer of 100mm+ global places of interest ("POI") includes 22 core attributes (see schema here) that will be updated monthly and available

Foursquare Open Source Places: A new foundational dataset for the geospatial community

I did not expect this!

[...] we are announcing today the general availability of a foundational open data set, Foursquare Open Source Places ("FSQ OS Places"). This base layer of 100mm+ global places of interest ("POI") includes 22 core attributes (see schema here) that will be updated monthly and available for commercial use under the Apache 2.0 license framework.

The data is available as Parquet files hosted on Amazon S3.

Here's how to list the available files:

aws s3 ls s3://fsq-os-places-us-east-1/release/dt=2024-11-19/places/parquet/

I got back places-00000.snappy.parquet through places-00024.snappy.parquet, each file around 455MB for a total of 10.6GB of data.

I ran duckdb and then used DuckDB's ability to remotely query Parquet on S3 to explore the data a bit more without downloading it to my laptop first:

select count(*) from 's3://fsq-os-places-us-east-1/release/dt=2024-11-19/places/parquet/places-00000.snappy.parquet';

This got back 4,180,424 - that number is similar for each file, suggesting around 104,000,000 records total.

The I ran this query to retrieve 1,000 places from that first file as newline-delimited JSON:

copy ( select * from 's3://fsq-os-places-us-east-1/release/dt=2024-11-19/places/parquet/places-00000.snappy.parquet' limit 1000 ) to '/tmp/places.json';

Here's that places.json file, and here it is imported into Datasette Lite.

Finally, I got ChatGPT Code Interpreter to convert that file to GeoJSON and pasted the result into this Gist, giving me a map of those thousand places (because Gists automatically render GeoJSON):

Via Andy Baio

Tags: open-source, gis, foursquare, datasette-lite, parquet, duckdb, code-interpreter, ai-assisted-programming, geojson


Bluesky WebSocket Firehose

Bluesky WebSocket Firehose Very quick (10 seconds of Claude hacking) prototype of a web page that attaches to the public Bluesky WebSocket firehose and displays the results directly in your browser. Here's the code - there's very little to it, it's basically opening a connection to wss://jetstream2.us-east.bsky.network/subscribe?wantedCollections=app.bsky.feed.post and logging out the results t

Bluesky WebSocket Firehose

Very quick (10 seconds of Claude hacking) prototype of a web page that attaches to the public Bluesky WebSocket firehose and displays the results directly in your browser.

Here's the code - there's very little to it, it's basically opening a connection to wss://jetstream2.us-east.bsky.network/subscribe?wantedCollections=app.bsky.feed.post and logging out the results to a <textarea readonly> element.

Bluesky's Jetstream isn't their main atproto firehose - that's a more complicated protocol involving CBOR data and CAR files. Jetstream is a new Go proxy (source code here) that provides a subset of that firehose over WebSocket.

Jetstream was built by Bluesky developer Jaz, initially as a side-project, in response to the surge of traffic they received back in September when Brazil banned Twitter. See Jetstream: Shrinking the AT Proto Firehose by >99% for their description of the project when it first launched.

The API scene growing around Bluesky is really exciting right now. Twitter's API is so expensive it may as well not exist, and Mastodon's community have pushed back against many potential uses of the Mastodon API as incompatible with that community's value system.

Hacking on Bluesky feels reminiscent of the massive diversity of innovation we saw around Twitter back in the late 2000s and early 2010s.

Here's a much more fun Bluesky demo by Theo Sanderson: firehose3d.theo.io (source code here) which displays the firehose from that same WebSocket endpoint in the style of a Windows XP screensaver.

Tags: websockets, bluesky, twitter, apis, mastodon

Tuesday, 19. November 2024

Simon Willison

OpenStreetMap vector tiles demo

OpenStreetMap vector tiles demo Long-time OpenStreetMap developer Paul Norman has been working on adding vector tile support to OpenStreetMap for quite a while. Paul recently announced that vector.openstreetmap.org is now serving vector tiles (in Mapbox Vector Tiles (MVT) format) - here's his interactive demo for seeing what they look like. Via Mark Litwintschik Tags: gis, openstree

OpenStreetMap vector tiles demo

Long-time OpenStreetMap developer Paul Norman has been working on adding vector tile support to OpenStreetMap for quite a while. Paul recently announced that vector.openstreetmap.org is now serving vector tiles (in Mapbox Vector Tiles (MVT) format) - here's his interactive demo for seeing what they look like.

Via Mark Litwintschik

Tags: gis, openstreetmap, mapping


Using uv with PyTorch

Using uv with PyTorch PyTorch is a notoriously tricky piece of Python software to install, due to the need to provide separate wheels for different combinations of Python version and GPU accelerator (e.g. different CUDA versions). uv now has dedicated documentation for PyTorch which I'm finding really useful - it clearly explains the challenge and then shows exactly how to configure a pyproject

Using uv with PyTorch

PyTorch is a notoriously tricky piece of Python software to install, due to the need to provide separate wheels for different combinations of Python version and GPU accelerator (e.g. different CUDA versions).

uv now has dedicated documentation for PyTorch which I'm finding really useful - it clearly explains the challenge and then shows exactly how to configure a pyproject.toml such that uv knows which version of each package it should install from where.

Via @charliermarsh

Tags: packaging, python, uv, pytorch, pip


Understanding the BM25 full text search algorithm

Understanding the BM25 full text search algorithm Evan Schwartz provides a deep dive explanation of how the classic BM25 search relevance scoring function works, including a very useful breakdown of the mathematics it uses. Via lobste.rs Tags: search, algorithms

Understanding the BM25 full text search algorithm

Evan Schwartz provides a deep dive explanation of how the classic BM25 search relevance scoring function works, including a very useful breakdown of the mathematics it uses.

Via lobste.rs

Tags: search, algorithms


Notes from Bing Chat—Our First Encounter With Manipulative AI

I participated in an Ars Live conversation with Benj Edwards of Ars Technica today, talking about that wild period of LLM history last year when Microsoft launched Bing Chat and it instantly started misbehaving, gaslighting and defaming people. Here's the video of our conversation. I ran the video through MacWhisper, extracted a transcript and used Claude to identify relevant articles I sh

I participated in an Ars Live conversation with Benj Edwards of Ars Technica today, talking about that wild period of LLM history last year when Microsoft launched Bing Chat and it instantly started misbehaving, gaslighting and defaming people.

Here's the video of our conversation.

I ran the video through MacWhisper, extracted a transcript and used Claude to identify relevant articles I should link to. Here's that background information to accompany the talk.

A rough timeline of posts from that Bing launch period back in February 2023:

Microsoft announces AI-powered Bing search and Edge browser - Benj Edwards, Feb 7, 2023 AI-powered Bing Chat spills its secrets via prompt injection attack - Benj Edwards, Feb 10, 2023 AI-powered Bing Chat loses its mind when fed Ars Technica article - Benj Edwards, Feb 14, 2023 Bing: “I will not harm you unless you harm me first” - Simon Willison, Feb 15, 2023 Gareth Corfield: I'm beginning to have concerns for @benjedwards' virtual safety - Twitter, Feb 15, 2023 A Conversation With Bing’s Chatbot Left Me Deeply Unsettled - Kevin Roose, NYT, Feb 16, 2023 It is deeply unethical to give a superhuman liar the authority of a $1 trillion company or to imply that it is an accurate source of knowledge / And it is deeply manipulative to give people the impression that Bing Chat has emotions or feelings like a human - Benj on Twitter (now deleted), Feb 16 2023 Bing AI Flies Into Unhinged Rage at Journalist - Maggie Harrison Dupré, Futurism, Feb 17 2023

Other points that we mentioned:

this AI chatbot "Sidney" is misbehaving - amazing forum post from November 23, 2022 (a week before even ChatGPT had been released) from a user in India talking about their interactions with a secret preview of Bing/Sydney Prompt injection attacks against GPT-3 - where I coined the term "prompt injection" in September 12 2022 Eight Things to Know about Large Language Models (PDF) is the paper where I first learned about sycophancy and sandbagging (in April 2023) Claude’s Character by Anthropic talks about how they designed the personality for Claude - June 8 2023, my notes on that. Why ChatGPT and Bing Chat are so good at making things up in which Benj argues for the term "confabulation" in April 2023.

Tags: arstechnica, bing, ethics, microsoft, podcasts, talks, ai, openai, generative-ai, gpt-4, llms, benj-edwards


IdM Laboratory

そういえばNewsPicksにパスキーネタが載ってます

こんにちは、富士榮です。 そういえば先日NewsPicksから取材受けていたなぁ、と思ったら記事になってました。 【真相】パスワード消滅。「ログイン」はここまで便利になった https://newspicks.com/news/10849029/body/ パスワードの使い回しの危険性から他要素認証、パスキーまでを網羅した良い記事になっていると思います。 有料記事なので登録しないと最後まで見れませんが。 よろしければどうぞ。

こんにちは、富士榮です。


そういえば先日NewsPicksから取材受けていたなぁ、と思ったら記事になってました。


【真相】パスワード消滅。「ログイン」はここまで便利になった

https://newspicks.com/news/10849029/body/



パスワードの使い回しの危険性から他要素認証、パスキーまでを網羅した良い記事になっていると思います。

有料記事なので登録しないと最後まで見れませんが。

よろしければどうぞ。

Simon Willison

Preview: Gemini API Additional Terms of Service

Preview: Gemini API Additional Terms of Service Google sent out an email last week linking to this preview of upcoming changes to the Gemini API terms. Key paragraph from that email: To maintain a safe and responsible environment for all users, we're enhancing our abuse monitoring practices for Google AI Studio and Gemini API. Starting December 13, 2024, Gemini API will log prompts and respon

Preview: Gemini API Additional Terms of Service

Google sent out an email last week linking to this preview of upcoming changes to the Gemini API terms. Key paragraph from that email:

To maintain a safe and responsible environment for all users, we're enhancing our abuse monitoring practices for Google AI Studio and Gemini API. Starting December 13, 2024, Gemini API will log prompts and responses for Paid Services, as described in the terms. These logs are only retained for a limited time (55 days) and are used solely to detect abuse and for required legal or regulatory disclosures. These logs are not used for model training. Logging for abuse monitoring is standard practice across the global AI industry. You can preview the updated Gemini API Additional Terms of Service, effective December 13, 2024.

That "for required legal or regulatory disclosures" piece makes it sound like somebody could subpoena Google to gain access to your logged Gemini API calls.

It's not clear to me if this is a change from their current policy though, other than the number of days of log retention increasing from 30 to 55 (and I'm having trouble finding that 30 day number written down anywhere.)

That same email also announced the deprecation of the older Gemini 1.0 Pro model:

Gemini 1.0 Pro will be discontinued on February 15, 2025.

Tags: gemini, google, generative-ai, ai, llms


The Pragmatic Engineer

Hiring software engineers and engineering leaders from Big Tech (Part 2)

Tactics and approaches for startups to hire software engineers with Big Tech experience, and why Amazon is a leading source of talent for early-stage businesses

Before we start: performance reviews/promotions are around the corner at many companies. As the end of the year is closing in: these events will happen at most organizations in a month or two. The best time to prepare is now – waiting longer might push things too late. See the deepdives Preparing for promotions ahead of time and Preparing for performance reviews ahead of time for tips on what you can do, now, to ensure a fair outcome for these processes.

In the first part of this two-part series, we looked into why Big Tech hires sometimes don’t work out for startups, and also when recruiting from the biggest companies does work for new tech players.

Today, we cover how to recruit from Big Tech, with some tried-and-tested tactics for doing it successfully – and what to avoid, as well. For this, I talked with nearly a dozen startup founders and hiring managers with Big Tech backgrounds. Thank you to everyone who contributed!

In this deep dive, we cover:

When it’s VERY hard to hire Big Tech talent. Golden handcuffs, retention bonuses, upcoming promotions, and no equity on offer.

Which companies do engineers quit for early-stage startups? Google, Meta, Amazon, and Apple, are common frequent sources of such hires. A look at 30 companies from where “stealth startups” hire, and where founding engineers are recruited from.

The right time to make an offer. Four-year vesting cliffs, shifting stock prices, burnout, negative performance feedback, and other factors, can drive talented professionals to seek opportunities beyond the biggest workplaces.

Flexibility. Full-remote work, asynchronous, part-time, and other flexible arrangements can make smaller companies more attractive to tech pros who value the flexibility which Big Tech mostly chooses not to provide.

Hiring outside of the US. Big Tech pay is location-dependent, so packages are significantly smaller outside the US – while still topping local markets. For startups, it’s often more feasible to hire Big Tech folks from the international job market.

Tips for hiring from Big Tech. Startup founders and senior engineers share what works for them in attracting Big Tech talent.

Amazon. The online retail giant is a major source of new hires for small companies for a good reason: it’s the most startup-like Big Tech company.

Google. Some startups have reservations about hiring directly from the tech leader.

The bottom of this article could be cut off in some email clients. Read the full article uninterrupted, online.

Read the full article online

Related articles:

Why techies leave Big Tech

Hiring software engineers and engineering leaders from Big Tech (Part 1)

State of the software engineering job market in 2024

Thriving as a Founding Engineer: lessons from the trenches

1. When it’s VERY hard to hire Big Tech talent

It’s never easy to attract talent from Big Tech for startup recruiters and leaders, and there are factors which make it harder.

Money

Golden handcuffs. When someone is on handsome compensation that’s paid over time, it’d be irrational for them to quit a Big Tech workplace. This is usually related to stock appreciation, or more rarely, generous retention bonuses. For example, most NVIDIA employees who joined in the last 2-4 years have “golden handcuffs”. We covered why rising stock prices make it hard to hire from public companies.

Pending retention bonus. Big Tech pays these to a small number of engineers and managers seen as top performers or key contributors. Retention bonuses may be cash or equity, and are paid after a set period, usually between 6-24 months, or in installments. If someone quit sooner, they’d say goodbye to a significant sum. Hiring such people usually involves offering an equivalent amount as a signing-on bonus.

Hiring from Netflix. The streaming service is a special case in Big Tech because it pays all cash compensation with the option to choose how much of the compensation is paid as stock options, where stock options are purchased at 40% of the stock price. For employees who choose mostly cash compensation: this means it is impossible for small enterprises to compete on cash with Netflix. Here’s what it offers:

$400,000 to $600,000 for senior software engineers in the US

$600,000 to $800,000 for staff software engineer in the US

~€350,000 for staff software engineer in the EU

The only companies that can match liquid total compensation packages like these are Big Tech and publicly traded tech companies, which offer much lower base salaries and make up the rest with equity.

We previously covered Netflix introducing levels to replace its single senior software engineer level.

Update on 19 November: added details on the unique stock options program at Netflix. Thank you to the comments on this!

Timing

Close to promotion. Leveling up can mean a significant 25-30% jump in compensation within Big Tech. An engineer close to a promotion might want to wait and see what happens, before deciding whether to quit. Cycles are typically twice yearly at most large companies. 

We cover promotion advice in Preparing for promotions ahead of time.

Engaged in a project. As a general rule, engineers and managers generally dislike leaving a large project before it’s finished. When investing a lot of effort, most people want to see it through, and so delay new opportunities until a launch is over.

Upcoming annual bonus. A month or two before bonuses are revealed is a hard time to hire from Big Tech because people understandably want to collect their bonuses; especially as some Big Tech companies reveal them up front, like Meta. Big Tech bonus dates:

Google: March

Meta: March

Microsoft: September

Apple: October

Netflix: no bonuses

Offering no equity to new hires

There are small companies which offer a base salary and even a cash bonus to new hires, but no equity, which makes hiring from Big Tech close to impossible. People interested in quitting Big Tech generally accept their total compensation will take a hit, short term. However, the expectation is that comp will shoot back up if they help make a new company into a success. This is why equity stakes matter.

Companies offering massive cash bonuses are an exception, of which hedge funds are the best example. They typically pay a relatively small base salary, but pay cash bonuses several times bigger, depending on fund performance. Hedge funds in locations like New York City and London are probably the only places that can issue no equity while still attracting Big Tech engineers and managers.

Other exceptions:

Large title bump. Offering roles like CTO, or head of engineering, to a former Big Tech senior engineer or engineering manager is a career boost that’s usually not available at Big Tech. 

Relocation to a new country. People are usually okay with taking a cut when migrating. Coupled with a title bump, a Big Tech professional might be fine with joining a more traditional company, especially if the local market has not much else to offer. Note that for former Big Tech employees, remote work and local startups with VC funding are strong alternatives to a title upgrade.

Basically, if a for-profit company doesn’t offer an equity stake or big job title boost, then it should probably forget about hiring directly from Big Tech.

2. Which companies do engineers quit for early-stage startups?

Sure, it’s hard to recruit people from major tech companies to startups, but it’s far from impossible. Some businesses have a track record for making such hires, and I gathered some data on this, with the help of Live Data Technologies. It tracks real-time data on employment changes, and contributed to this publication’s recent report on the state of the software engineering job market in 2024.

Companies where founding engineers are hired from

Founding engineers are among the first software engineer hires at new startups. The data below shows where founding engineers come from:

Source: Live Data Technologies

Four of the world’s five largest tech companies are the top sources of founding engineers for smaller companies. Among them, Microsoft lags behind. Most businesses on this list are publicly traded, and it’s fair to assume plenty of engineers left the likes of Affirm or Instacart after they went public.

What I find surprising is that there are private companies from which plenty of founding engineers are hired, such as Stripe and Airtable. This is unusual because usually most engineers would wait for an IPO – and getting a payout – before leaving. What might explain these two companies is that Stripe organizes secondary stock sales (providing liquidity to current and past employees), and that Airtable let go of about half its employees 2022-2023, as we previously covered.

In a separate deepdive, we covered How to thrive as a founding engineer.

Companies which “stealth startups” recruit from

When a software engineer updates their LinkedIn profile to say they work at “Stealth Startup,” it can mean one of several things:

They are cofounding a startup and do not want to announce it, yet. Doing so only when there is a product potentially means more press and hype to acquire users with.

They joined an early-stage startup which asks employees to not publicly name for the same reason as above.

They joined a company with a policy to not publically share their workplace. A well-known company which does this is Cloud Kitchens, the “ghost kitchen” startup cofounded by former Uber CEO Travis Kalanick. Employees are not allowed to share where they work on any public network.

Tracking “stealth startup” is a pretty good way to get a sense of early-stage companies. Here’s the data:

Source: Live Data Technologies

Google, Amazon, Meta, Microsoft, and Apple are the largest tech companies by numbers of employees, so it’s unsurprising they’re the most common source of “stealth startup” hires. Uber being so high could be due to the so-called “Uber mafia” network known for founding startups, as well as former Uber staff at Cloud Kitchens having to put “stealth startup” on their LinkedIn profiles.

It’s curious that hardware companies such as Intel, Cisco, Palo Alto Networks, VMWare, and NVIDIA, are not leading sources for founding engineer hires. I wonder if this is because software startups are more likely to call early their hires “founding engineers”, or if ex-hardware company people are more likely to join hardware startups. If you have any thoughts on this, please drop a note in the comments!

The data confirms it’s eminently possible to hire from the largest tech companies when offering a founding engineer role, and when operating an early-stage, stealth startup. In Part 1, we previously covered tactics on how to poach workers from Big Tech; recruiting “founding engineers” was one method. 

3. The right time to make an offer

It’s critical that a startup knows when to approach candidates in Big Tech, who rarely think about quitting the industry’s largest workplaces. Obviously, much depends on individuals, but there are moments when people may be more open to the idea than usual.

Passing a four-year equity vesting cliff

For senior Big Tech hires, equity is a major part of total comp. Almost all the Big Tech companies issue generous initial equity grants which typically vest over 4 years. Top-ups can be issued, but it’s common for a Big Tech engineer’s total compensation in year 5 to drop lower than in years 1-4. This steep fall may be reason enough to start exploring alternatives. We cover plummeting comp in the deep dive, Why techies leave Big Tech.

As a founder or hiring manager at a startup, reaching out to Big Tech engineers who are at the 4-year mark could be the time when they’re most responsive to offers. 

It’s worth bearing in mind that a Big Tech employer could give a candidate a refresher or retention bonus if they’re seen as a standout performer. For a startup, softening the blow of a comp drop will make recruiting these people harder.

Declining stock value

For senior-and-above folks with 30% or more of their total compensation in stock, a downward stock price movement lasting 6+ months will cut their take-home pay. This is why Meta was a relatively “easy” place to hire from in late 2022, when its stock price was at a 7-year low. It’s also why many tech companies that IPO’d in 2020-21 and then saw a 50-80% stock price drop in 2022, became easier places to poach talent from.

If you are a founder or hiring manager at a startup, check the stock price trajectory of the companies which your candidates work at. Employees at places with standout stock performance are less likely to be interested in a switch, than at places which have lost significant share value.

We cover more on compensation drops, in numbers, in Why techies leave Big Tech.

After a stock-vesting milestone

Many people open to leaving Big Tech like to wait until the next major stock vest date before they decide. As a hiring manager at a startup, it can be useful to know some important dates for this. At Meta, stock vests are quarterly on 15 Feb, 15 May, 15 Aug and 15 Nov. When people leave, it’s usually after one of these dates.

Burnout

A founder of a pre-seed startup in California who hired an engineer from Meta and one from SpaceX, believes both these people were motivated by intense working conditions to seek more flexible working cultures elsewhere. They say:

“I think these two engineers were ready to jump mainly because they were both burnt out. The SpaceX hire was a new grad who was burnt out from doing 100 hour weeks (!). This person had made some additional cash from a tender offer at SpaceX.

The Meta employee was somewhat post-economic: they owned two houses, so felt they could take some risk in joining a startup.”

Poor performance feedback, PIPs

Big Tech employers use Performance Improvement Plans (PIP) to tell workers they’re not meeting expectations. Bouncing back after getting a PIP is possible, but it’s highly dependent on variables like manager relationships and other specifics. One place that’s notorious for using PIPs to essentially drive people out is Amazon. I’ve talked with a few engineers at the online retail giant; the consensus is that few people ever “clear” a PIP, and as little as 10% stay at Amazon. One engineer spoke with authority on the topic because they achieved exactly that. A problem with PIPs is that they slow down career trajectory:

Bonuses: Being on a PIP can mean no bonuses or pay raises in the next performance cycle. Interestingly, other places do award some bonuses to staff on PIPs. When I was at Uber, in one year people on a PIP were awarded 80% of their bonus targets – but the following year it was zero.

Career trajectory and promotions: Being on a PIP almost certainly precludes running for promotion in the next one or two promotion cycles, and will likely be raised when promotion is a possibility.

Job security: Workers on PIPs are in pole position to lose their jobs in the event of job cuts. When a larger cut happens, it’s common for “low performers” to be told their position is being eliminated.

“Below expectations” feedback can push engineers to consider options elsewhere because this is often a step towards a PIP, and prevents internal transfers and stifles career progression. For example, at Meta, the rating is “DNE” (Does Not Meet Expectations) and MS (Meets Some), as we cover in Inside Meta’s engineering culture.

Many engineers in Big Tech are ambitious people for whom a slowdown in career trajectory could be reason enough to look elsewhere. They’re more likely to start actively looking for a job when a PIP starts if they suspect they will not clear it, or if they believe their short-term career growth will be hampered even if they do.

Layoffs

It used to be very rare for Big Tech to do large layoffs, or even fire software engineers for relatively minor breaches. Those were different times. Recently, Meta unexpectedly fired staff in its LA office for spending $25 dinner vouchers on non-food items. For people not even directly affected by firings over vouchers, episodes like this can contribute to decisions to quit because they reveal how expendable workers are at the biggest companies. Some might be open to a small compensation cut in exchange for more job security.

Venture-funded startups can rarely offer more job stability than Big Tech, but profitable, bootstrapped businesses, or traditional companies can. Also, while startups aren’t famous for job stability, they are more “personable” places to work at than large corporations.

There’s a good reason why recruiters’ emails overwhelm the inboxes of engineers at companies doing job cuts: it works! It’s when the quantity of responses is highest from people affected, and also from colleagues who dodged the chop this time, but fear future rounds of job losses.

4. Flexibility

Big Tech companies (except Amazon) use hybrid setups, with 2-3 days spent working in the office each week. Startups can get a competitive advantage by offering more remote working to Big Tech employees. This may tempt Amazon workers who will soon be in the office all five days a week, following the online retail giant’s recent announcement. Other flexible work patterns are also available:

Read more


John Philpin : Lifestream

😂😂😂😂 Gosh darn - and here I am sitting just few rows back

😂😂😂😂 Gosh darn - and here I am sitting just few rows back from said jet cockpit - and not a sticker to be found.

😂😂😂😂

Gosh darn - and here I am sitting just few rows back from said jet cockpit - and not a sticker to be found.


Ben Werdmüller

SF tech layoffs: Tales of post-pink-slip reinvention

[Jillian D'Onfro at The San Francisco Standard] On one level, this piece about tech workers leaving the industry behind and doing something more culturally meaningful is quite nice: "Andrew Wasilewski, who managed to live “very frugally” on his layoff package while launching the Faight Collective, a music and art community in the Lower Haight, signed the lease for the space

[Jillian D'Onfro at The San Francisco Standard]

On one level, this piece about tech workers leaving the industry behind and doing something more culturally meaningful is quite nice:

"Andrew Wasilewski, who managed to live “very frugally” on his layoff package while launching the Faight Collective, a music and art community in the Lower Haight, signed the lease for the space mere weeks after his last day of work in tech sales."

But then you find yourself asking: how does an artist community pay a lease on the Lower Haight? How do any of these folks live like this, even for a while, in one of the most expensive cities in the world?

And the answer is obvious, and a little sad, and perhaps not very empowering after all.

#Culture

[Link]


Inside UnitedHealth’s Playbook for Limiting Mental Health Coverage

[Annie Waldman at ProPublica] UnitedHealth Group has been using an algorithm to determine whether patients have been receiving "too much" therapy and then cutting them off: "Around 2016, government officials began to pry open United’s black box. They found that the nation’s largest health insurance conglomerate had been using algorithms to identify providers it determined we

[Annie Waldman at ProPublica]

UnitedHealth Group has been using an algorithm to determine whether patients have been receiving "too much" therapy and then cutting them off:

"Around 2016, government officials began to pry open United’s black box. They found that the nation’s largest health insurance conglomerate had been using algorithms to identify providers it determined were giving too much therapy and patients it believed were receiving too much; then, the company scrutinized their cases and cut off reimbursements."

The kicker here is the regulatory arbitrage: the practice has been ruled illegal in three states so far, but United simply undertakes its activities to a state where it's still legal. And because it doesn't answer to a single regulator, it's hard to impose stronger rules. In fact, more than 50 regulators each have jurisdiction over small slices of United's activities.

Effectively that makes it ungovernable:

"For United’s practices to be curbed, mental health advocates told ProPublica, every single jurisdiction in which it operates would have to successfully bring a case against it."

And:

"State regulators are supposed to be making sure private insurers that manage Medicaid plans are following the mental health parity laws. But this year, a federal audit found that they were failing to do so. “They are not well designed to essentially be watchdogs,” Lloyd said. “There’s very little accountability. Insurers can run roughshod over them.”"

In other words, the system needs to be radically overhauled if patients are going to receive adequate care. Will it be? Perhaps not soon.

#Health

[Link]


Microsoft and Google incubated Chinese startups tied to police

[Joanna Chiu and Viola Zhou at Rest of World] Tech companies like Microsoft and Google have, through their accelerators, supported startups that provide censorship and policing technologies in China. It's perhaps not a surprise that they've supported these endeavors - after all, startups look to find product/market fit in their regions - but it flies in the face of efforts the

[Joanna Chiu and Viola Zhou at Rest of World]

Tech companies like Microsoft and Google have, through their accelerators, supported startups that provide censorship and policing technologies in China. It's perhaps not a surprise that they've supported these endeavors - after all, startups look to find product/market fit in their regions - but it flies in the face of efforts they've made to appear to care about human rights.

I've been thinking about this a lot:

"Support for the companies through their startup incubator programs raises questions about the future of these initiatives, especially as Donald Trump prepares to take a second term as president."

We know that tech companies comply with authoritarian regimes when they try to do business there. There's a long history of that, from IBM colluding with the Nazis through Yahoo giving up the identities of bloggers to the Chinese authorities. What happens when their home turf becomes one? I don't think we can expect anything other than collaboration.

At this point, that's mostly speculation (beyond existing contracts with ICE, say) - but there's no doubt that surveillance and censorship have been used in China to squash dissent and commit human rights abuses. The tech companies who directly fund the infrastructure to do this are complicit, and should be publicly held as such.

#Technology

[Link]


John Philpin : Lifestream

🪦 so long Pete - thank you - I guess you’re now on the insid

🪦 so long Pete - thank you - I guess you’re now on the inside - looking outside - is it very different?

🪦 so long Pete - thank you - I guess you’re now on the inside - looking outside - is it very different?


Simon Willison

Security means securing people where they are

Security means securing people where they are William Woodruff is an Engineering Director at Trail of Bits who worked on the recent PyPI digital attestations project. That feature is based around open standards but launched with an implementation against GitHub, which resulted in push back (and even some conspiracy theories) that PyPI were deliberately favoring GitHub over other platforms. Wi

Security means securing people where they are

William Woodruff is an Engineering Director at Trail of Bits who worked on the recent PyPI digital attestations project.

That feature is based around open standards but launched with an implementation against GitHub, which resulted in push back (and even some conspiracy theories) that PyPI were deliberately favoring GitHub over other platforms.

William argues here for pragmatism over ideology:

Being serious about security at scale means meeting users where they are. In practice, this means deciding how to divide a limited pool of engineering resources such that the largest demographic of users benefits from a security initiative. This results in a fundamental bias towards institutional and pre-existing services, since the average user belongs to these institutional services and does not personally particularly care about security. Participants in open source can and should work to counteract this institutional bias, but doing so as a matter of ideological purity undermines our shared security interests.

Via lobste.rs

Tags: python, security, pypi, github

Monday, 18. November 2024

IdM Laboratory

Taiwan Digital Wallet International Forum 2024に登壇します

こんにちは、富士榮です。 来月12月11日(水)に台湾政府主催の「Taiwan Digital Wallet International Forum 2024」に登壇します。 私はキーノートとパネルディスカッションで既存システムとWalletエコシステムの相互運用の話などをさせていただく予定です。また後半にはMarkus Sabadelloも登壇するそうなのでとても楽しみです。 場所は台湾の「GIS MOTC Convention Center, International Conference Hall」です。もし台北にいらっしゃる場合はぜひお越しください。 こちらから申し込めます。 https://seminars.tca.org.tw/D10s00033.aspx こちらがアジェンダです。 Time Topic Speakers 1
こんにちは、富士榮です。
来月12月11日(水)に台湾政府主催の「Taiwan Digital Wallet International Forum 2024」に登壇します。
私はキーノートとパネルディスカッションで既存システムとWalletエコシステムの相互運用の話などをさせていただく予定です。また後半にはMarkus Sabadelloも登壇するそうなのでとても楽しみです。
場所は台湾の「GIS MOTC Convention Center, International Conference Hall」です。もし台北にいらっしゃる場合はぜひお越しください。
こちらから申し込めます。 https://seminars.tca.org.tw/D10s00033.aspx
こちらがアジェンダです。

Time

Topic

Speakers

10:30-10:45

Opening Remarks

Yi-Jing Lin, Deputy Minister, Ministry of Digital Affairs

10:45-11:20

Keynote Speech 1:

How to achieve interoperability with the current ecosystems

Naohiro Fujie, Chairman, OpenID Foundation Japan

11:20-12:00

Panel Discussion 1:

Building a Cross-domain Ecosystem for Digital Wallets

Moderator:

     Nicole T.I Chan, Chairperson, Digital Trust Association in Taiwan  

Panelists:

      Amber Zhong, Payment Partner Development Manager, SG & GRCN, Google       Jeng Linda Wen, Founder & CEO, Digital Self Labs LLC       Naohiro Fujie, Chairman, OpenID Foundation Japan

12:00-13:30

Lunch Networking

 

13:30-14:00

Keynote Speech 2:

EU Digital Identity Wallet & The pursuit of Self-Sovereignty

Markus Sabadello, Founder, Danube Tech

14:00-14:30

Keynote Speech 3:

Creating a citizen-first digital wallet blueprint

Moderator:    

     Yen-Lin Huang, Web3 Architect,  Ministry of Digital Affairs

Speakers: 

     Denken,Tech Lead, NYCU Innovation Research Lab      Angela Sher, Director, DreamVok      Chang En Li, Section Chief, Enterprise Business Group Chunghwa Telecom Co., Ltd.

14:30-14:50

Tea Break

 

14:50-15:20

Fireside Chat:

Digital Sovereignty and Physical Privacy—

Digital Footprints, Anti-tracking, and Privacy Enhancement as a Vision

      Alexis Hancock, Director, Engineering of Electronic Frontier Foundation       Singing Li, CEO, Open Culture Foundation

15:20-16:00

Panel Discussion 2:

How to Build a Trusted Digital Economy Ecosystem

Moderator:

     Wei-Chung Hwang, Director, Industrial Technology Research Institute (ITRI)

Panelists:

      Hung-Yi Tu, Chief Strategy Officer, TWCA       Jeff Kuo,Co-Founder & CEO, Gogolook       Karen Chang, Chair of FIDO Taiwan Engagement Forum



Simon Willison

Pixtral Large

Pixtral Large New today from Mistral: Today we announce Pixtral Large, a 124B open-weights multimodal model built on top of Mistral Large 2. Pixtral Large is the second model in our multimodal family and demonstrates frontier-level image understanding. The weights are out on Hugging Face (over 200GB to download, and you'll need a hefty GPU rig to run them). The license is free for academi

Pixtral Large

New today from Mistral:

Today we announce Pixtral Large, a 124B open-weights multimodal model built on top of Mistral Large 2. Pixtral Large is the second model in our multimodal family and demonstrates frontier-level image understanding.

The weights are out on Hugging Face (over 200GB to download, and you'll need a hefty GPU rig to run them). The license is free for academic research but you'll need to pay for commercial usage.

The new Pixtral Large model is available through their API, as models called pixtral-large-2411 and pixtral-large-latest.

Here's how to run it using LLM and the llm-mistral plugin:

llm install -U llm-mistral llm keys set mistral # paste in API key llm mistral refresh llm -m mistral/pixtral-large-latest describe -a https://static.simonwillison.net/static/2024/pelicans.jpg

The image shows a large group of birds, specifically pelicans, congregated together on a rocky area near a body of water. These pelicans are densely packed together, some looking directly at the camera while others are engaging in various activities such as preening or resting. Pelicans are known for their large bills with a distinctive pouch, which they use for catching fish. The rocky terrain and the proximity to water suggest this could be a coastal area or an island where pelicans commonly gather in large numbers. The scene reflects a common natural behavior of these birds, often seen in their nesting or feeding grounds.

Update: I released llm-mistral 0.8 which adds async model support for the full Mistral line, plus a new llm -m mistral-large shortcut alias for the Mistral Large model.

Via @dchaplot

Tags: vision-llms, mistral, llm, generative-ai, ai, llms


Qwen: Extending the Context Length to 1M Tokens

Qwen: Extending the Context Length to 1M Tokens The new Qwen2.5-Turbo boasts a million token context window (up from 128,000 for Qwen 2.5) and faster performance: Using sparse attention mechanisms, we successfully reduced the time to first token for processing a context of 1M tokens from 4.9 minutes to 68 seconds, achieving a 4.3x speedup. The benchmarks they've published look impressive,

Qwen: Extending the Context Length to 1M Tokens

The new Qwen2.5-Turbo boasts a million token context window (up from 128,000 for Qwen 2.5) and faster performance:

Using sparse attention mechanisms, we successfully reduced the time to first token for processing a context of 1M tokens from 4.9 minutes to 68 seconds, achieving a 4.3x speedup.

The benchmarks they've published look impressive, including a 100% score on the 1M-token passkey retrieval task (not the first model to achieve this).

There's a catch: unlike previous models in the Qwen 2.5 series it looks like this one hasn't been released as open weights: it's available exclusively via their (inexpensive) paid API - for which it looks like you may need a +86 Chinese phone number.

Via @alibaba_qwen

Tags: llms, ai, qwen, generative-ai


Wrench in the Gears

Broken Hearts, Wild Boundaries, and Garden Meditations

There’s a gentle rain coming down tonight, and I’m enjoying the soothing sounds of it in my cozy sunroom. I feel like I’m stealing time here. This space is enclosed by rickety old metal windows, and with the drafts and lack of insulation, I expect it will be off-limits in a few weeks for the [...]

There’s a gentle rain coming down tonight, and I’m enjoying the soothing sounds of it in my cozy sunroom. I feel like I’m stealing time here. This space is enclosed by rickety old metal windows, and with the drafts and lack of insulation, I expect it will be off-limits in a few weeks for the duration of the winter season.

I spent the day in my little yard, as I have done for the past three weeks prepping garden beds for next year’s growing season. To tide me over, I planted some rocket and coriander last month in the existing beds next to the house along with a flat of cheerful pansies that have mostly survived the deer and rabbits. I do love poached eggs on a bed of spicy arugula.

As far as “clearance” tasks, so far, I’ve engaged with creeping poison ivy (not sure if it will be completely vanquished in round one) and lots of strangling honey suckle (salvaged a few sections of lovely spirals saved in a vase for future heart intentions). I disposed of a rather vicious cat briar bramble in tiny bits with only a slight loss of blood) as well as an onslaught of crape myrtle scale (trunks cut down and stacked for burning next spring as I don’t want to put it out for garden collection and inadvertently spread it around). I have a large stash of fallen leaves stored in the garage for mower shredding and mulching under the greatly thinned out shrubs. 

I’ve distributed two pallets of Oklahoma rocks as edging around the yard and moved eight yards of hardwood mulch.  I’ll need more before it’s all over, though perhaps I can make do with visits to the municipal compost facility where I go to get bins of leaf mulch to lighten the Garland County clay soil. Many bags of composted manure have been added. There are earthworms in the front yard, but not much sign of life in the back. I’m hoping the addition will add a bit more zest to the proceedings and maybe in the spring I can do a Korean natural farming inoculation.

I have four metal raised garden beds waiting to be assembled in the basement, but that is a task that will have to wait until December. My goal for the next two weeks is to complete 3-4 more beds and whittle down the remaining “super-soil” mix that has been a hulking presence in my front yard since Halloween. I also need to relocate my garlic from the refrigerator into the ground and plant two boxes of “deer resistant” spring bulbs, with the help of an exuberant four-year-old neighbor on the block. Afterwards I promised to celebrate our accomplishment with a tea party.

The two heart-shaped beds that surround decaying stumps from the former owners in the front yard have been inaugurated with perennials and a $6 hybrid tea rose salvaged from the end of season clearance racks at Lowe’s – coreopsis, balloon flower, lobelia cardinalis, and sweet William, a favorite of my great aunt Vivi. I added a few basil cuttings I’d rooted from a bundle I’d gotten at the farmer’s market. Sure, they won’t overwinter, but it makes me happy to look at their greenery out the picture window and gather garnishes a few times a week.

I installed three ferns under the azalea out front where the grass won’t grow. When I bought the house, the shrub was a rather nondescript lump, but I cleared out the dead wood, shaped it a bit, and it rebounded nicely with a trimmer, elegant structure (ah, if only I could say the same for myself). In China azaleas are known as the “thinking of home bush” (sixiang shu), which is somewhat bittersweet. There are two empty nets inside – perhaps symbolic of my old home and my new one here in Hot Springs? This is bird land and often jays or cardinals or wrens hop into its branches and peer at me inside as I do my work. They’re good company.

Outside my front door a petite wooden Don Quixote, a gift from a friend, stands guard next to my planter. I repotted the marigold that was languishing in an old fiberglass urn a few months ago and even though it has gotten a bit leggy; it continues to offer a bright pop of cheerful color. The strange thing, and tell me if this is common, is that in the past few weeks the new blossoms coming on are a clear, bright yellow instead of the red/orange combo on the rest of the plant. These flowers are on the existing branches, right next to the regular ones – strange. There is also a mother spider who had been tending a rather large egg case for many weeks. I was kind of hoping for a bit of “Some Pig” action, but nothing seems to be happening. The spider appears to be a devoted mother. We are running out of time before frost. I hope everything is ok.

Today was overcast with a slight drizzle, which made sod removal easier. The lawn here is a hodgepodge of random grasses and weeds. I’m ok with that and the neighbors aren’t snobs about appearances, which is nice. Over time, I hope to have much less grass and many more flowers. Yesterday I started on my “broken heart” project. I wanted to plant bulbs next to the driveway, but I didn’t want to line the pavement, since I need to cross over to get to the hose and to mow and roll out the trash cans. Instead, given the lay of the land, I decided to create two half-heart beds, one on either side. Even though my heart has been broken, and there is pain associated with it, the brokenness has also opened new opportunities for me and I am embracing them with gratitude.

The stones I chose for the borders are somewhat rounded, but irregular. I know that I will have to go back and create a dirt edge beyond them for the mower, but I love the casual cottage look they provide. I’m adding lots of curves to a lawn that was very square, creating little “rooms.” I feel like a little girl again, playing in the empty lots and on the dirt piles around our subdivision created from a rolling Kentucky farm in the 1970s. I selected a pallet of rocks that were manageable for me to move. Each is the size of a sour dough loaf or smaller. In laying them out I tried to match the edges, so that they nestled against one another. I’ve read that soil heave may topple some of the stones and that they need to be lifted and reset every few years. I guess that will be my workout. I’m ok with that. This nestling and unsettling seems symbolic for relationships that once felt solid, but given a particular harsh cold snap were swiftly upended.

Once I move the remaining pile of “super soil,” I plan to put a circle bed in place in the side yard over the dead grass, an intention for renewal and future reconnection – infinite potential. In it I plan to plant iris saved from underneath the overgrown shrubbery in the backyard, red spider lilies gathered from around the stumps, and when it gets warmer tall sunflowers, okra, and amaranth recently gifted to me by post. The sunflowers are in honor of my dad, a Kansas boy. I want to grow lots and lots of okra, a relative of the hibiscus. I already have three types of seed waiting in the cupboard. I am going  to learn to cook it in many, many ways. Lately I’ve been hanging out with folks who host vegan pot-lucks and I am gradually expanding my repertoire.

Behind the pile is the neighbor’s chain link fence with aging wooden ties that are composting at the base of it. There, soil and seeds collect, and ants make their home. Lots of weed trees grow out of this fence line. I’m on the down slope side, so it guess it is my lot to try and figure out how to manage the coppiced stumps that had already weathered years of prunings before I arrived on the scene.

In my kitchen is a sweet hand-painted plaque someone made me with dandelions and the saying “Grow in the Cracks.” I love that idea – it is on the wild borders where the magic happens. I know I am never going to eliminate “the weeds” along this fence. They were here before me and will very likely be here after me, since I’m not inclined to use brush killer on them. In the coming season we will learn to be with each other, and perhaps the pear and privet, oak and gum trees will offer their tender shoots to fuel my compost pile.

As I dug into the lawn I could see where the topsoil hauled in gave way to the slate and clay of the old, old ridge line below. These are the bones on which my little brick ranch house sits, the remnants of an ancient sea hundreds of millions of years old stacked up when what is now South America slammed into what is now North America and created the Ouachitas, one of only a few ranges in the United States that runs east-west. The orange cream color of the hot springs sandstone looks like sherbet. Supposedly the water that creates our hot springs percolates out of the sandstone, but my hunch is that it may actually be primary water resulting from volcanic activity in nearby Magnet Cove.

For me gardening has become a meditative practice, not intentionally so, but I recognize it now as a time when ideas come to me as my hands are busy and my mind is open to wandering. Today I probably spent three hours turning and sifting through the soil removing roots and rocks and sorting them for compost and lining a small swale I made in the back to route the downspout water away from the garden shed. I am terrible about wearing gardening gloves (thank goodness for Gardener’s Hand Therapy lotion), but it feels so good to connect with this land and feel it directly. I couldn’t help but remember John Trudell’s saying that we are shapes of the Earth, of the Biblical story of Adam formed of clay. The idea that treasures exist all around us, even beneath the ground upon which we walk – whether wriggling earthworms or a “smile.”

A heart broken open has room to grow. When I was at a very difficult point in my life last year, I told myself that I was going to create a garden and make my corner of the world as beautiful as I could. I feel like this process of being quiet in a place where the song of birds surrounds me and the pine trees offer a verdant embrace is just what I need right now. I have been blessed with a relatively blank slate to work with, a sunny southern slope on a quiet block with nice neighbors. This is my path towards healing. I am ready to move on from the trauma we have been navigating for the past several years into a new state of mind. Being on garden time, I can cultivate more detachment from the drama of digital space while finding solace in the simple things – accomplishments like a brush pile whittled down or an asparagus bed ready for the arrival of crowns next spring. The bulbs sitting in the boxes in my garage, each a blueprint of embodied joy awaiting their day in the sun a few months from now.

Below are some recent videos inspired, in part, by my days puttering around my yard. It is a random assortment, but perhaps there will be something of use for you as you walk your own road. Use your brain, think about your thinking, try to not go too deep into any one “reality tunnel.” The archetype of curious observer has been serving me well. Keep your eyes open for the universe’s playful possibilities. I have found operating from a place of bemused wonder is preferable to operating from a place of fear. In this community there are monthly kirtans where people gather to sing mantras together led by a local woman with a beautiful voice who plays a harmonium. I’ve never been much for yoga, but singing yoga for me is great. One of the mantras is directed to Ganesha, remover of obstacles. I am picturing my careful removal of the cat briars, the painful tangles we are all confronted with in life, knowing that our consciousness in connection with the divinity of the universe is very powerful. May we go forward with love and confidence that we are here for a purpose even though the way may at times be quite murky.

 

 

 


Simon Willison

Quoting Jack Clark

The main innovation here is just using more data. Specifically, Qwen2.5 Coder is a continuation of an earlier Qwen 2.5 model. The original Qwen 2.5 model was trained on 18 trillion tokens spread across a variety of languages and tasks (e.g, writing, programming, question answering). Qwen 2.5-Coder sees them train this model on an additional 5.5 trillion tokens of data. This means Qwen has been tr

The main innovation here is just using more data. Specifically, Qwen2.5 Coder is a continuation of an earlier Qwen 2.5 model. The original Qwen 2.5 model was trained on 18 trillion tokens spread across a variety of languages and tasks (e.g, writing, programming, question answering). Qwen 2.5-Coder sees them train this model on an additional 5.5 trillion tokens of data. This means Qwen has been trained on a total of ~23T tokens of data – for perspective, Facebook’s LLaMa3 models were trained on about 15T tokens. I think this means Qwen is the largest publicly disclosed number of tokens dumped into a single language model (so far).

Jack Clark

Tags: jack-clark, generative-ai, training-data, ai, qwen, llms


llm-gemini 0.4

llm-gemini 0.4 New release of my llm-gemini plugin, adding support for asynchronous models (see LLM 0.18), plus the new gemini-exp-1114 model (currently at the top of the Chatbot Arena) and a -o json_object 1 option to force JSON output. I also released llm-claude-3 0.9 which adds asynchronous support for the Claude family of models. Tags: llm, plugins, ai, llms, async, python, generati

llm-gemini 0.4

New release of my llm-gemini plugin, adding support for asynchronous models (see LLM 0.18), plus the new gemini-exp-1114 model (currently at the top of the Chatbot Arena) and a -o json_object 1 option to force JSON output.

I also released llm-claude-3 0.9 which adds asynchronous support for the Claude family of models.

Tags: llm, plugins, ai, llms, async, python, generative-ai, projects, claude, gemini, anthropic, google


Damien Bod

ASP.NET Core BFF using OpenID Connect and Vue.js

This article shows how to implement a secure web application using Vue.js and ASP.NET Core. The web application implements the backend for frontend security architecture (BFF) and deploys both technical stacks as one web application. HTTP only secure cookies are used to persist the session. OpenIddict is used as the identity provider and the token … … Continue reading →

This article shows how to implement a secure web application using Vue.js and ASP.NET Core. The web application implements the backend for frontend security architecture (BFF) and deploys both technical stacks as one web application. HTTP only secure cookies are used to persist the session. OpenIddict is used as the identity provider and the token issuer.

Code: https://github.com/damienbod/bff-aspnetcore-oidc-vuejs

Overview

The solution is deployed as a single OpenID Connect confidential client using a separate OpenIddict identity provider. The OpenID Connect client authenticates using the code flow with PKCE and a secret or a certificate. The UI part of the solution is deployed as part of the server application. Secure HTTP only cookies are used to persist the session after a successful authentication. No security flows are implemented in the client part of the application. No sensitive data like tokens are exposed in the client browser. By removing the security from the client, the security is improved and the complexity is reduced.

Setup Vue.js application

The Vue.js UI is setup so that the default development environment is used like in any Vue.js standalone application. A reverse proxy is used to integrate the application into the secure backend development environment. The UI uses Vue.js 3 with Typescript and Vite.

HTTPS setup and Production build

The production build is used to add the application as a UI view in the server rendered application, in this case ASP.NET Core. I always use HTTPS in development, so that the errors are discovered early and a strong CSP can also be used. This is all setup in the vite project file.

import { defineConfig } from 'vite' import vue from '@vitejs/plugin-vue' import fs from 'fs'; // https://vitejs.dev/config/ export default defineConfig({ plugins: [vue()], server: { https: { key: fs.readFileSync('./certs/dev_localhost.key'), cert: fs.readFileSync('./certs/dev_localhost.pem'), }, port: 4202, strictPort: true, // exit if port is in use hmr: { clientPort: 4202, }, }, optimizeDeps: { force: true, }, build: { outDir: "../server/wwwroot", emptyOutDir: true }, })

CSP setup

The CSP is setup to use nonces both in development and production. This will save time fixing CSP issues before you go live. Vue.js creates scripts and styles on a build or a npm dev (vite). The scripts require the nonce. The styles require a nonce in production. To add the server created nonce, the index.html file uses a meta tag in the header as well as the server rendered middleware parsing for scripts and styles. The nonce gets added and updated with a new value on every HTTP response. This can be used directly in the Vue.js code. When adding further script statically or dynamically, the nonce placeholder can be used. This gets updated dynamically in development and production environments.

<!doctype html> <html lang="en"> <head> <meta charset="UTF-8" /> <meta name="CSP_NONCE" content="**PLACEHOLDER_NONCE_SERVER**" /> <link rel="icon" type="image/svg+xml" href="/vite.svg" /> <meta name="viewport" content="width=device-width, initial-scale=1.0" /> <title>Vite + Vue + TS</title> </head> <body> <div id="app"></div> /src/main.ts </body> </html>

The ASP.NET Core _host file is used to serve up the index.html and adds in the dynamic bits to the Vue.js application. The scripts and styles have a nonce applied in production and the scripts in the development environment. Added and replace the CSP nonce can be done in different ways and needs to match the Vue.js index.html. This can change, depending on the setup of the Vue.js index.html.

@page "/" @namespace BlazorBffAzureAD.Pages @using System.Net; @using NetEscapades.AspNetCore.SecurityHeaders; @addTagHelper *, Microsoft.AspNetCore.Mvc.TagHelpers @addTagHelper *, NetEscapades.AspNetCore.SecurityHeaders.TagHelpers @inject IHostEnvironment hostEnvironment @inject IConfiguration config @inject Microsoft.AspNetCore.Antiforgery.IAntiforgery antiForgery @{ Layout = null; var source = ""; if (hostEnvironment.IsDevelopment()) { var httpClient = new HttpClient(); source = await httpClient.GetStringAsync($"{config["UiDevServerUrl"]}/index.html"); } else { source = System.IO.File.ReadAllText($"{System.IO.Directory.GetCurrentDirectory()}{@"/wwwroot/index.html"}"); } var nonce = HttpContext.GetNonce(); // The nonce is passed to the client through the HTML to avoid sync issues between tabs source = source.Replace("**PLACEHOLDER_NONCE_SERVER**", nonce); var nonceScript = $"<script nonce=\"{nonce}\" type="; source = source.Replace("<script type=", nonceScript); // link rel="stylesheet" var nonceLinkStyle = $"<link nonce=\"{nonce}\" rel=\"stylesheet"; source = source.Replace("<link rel=\"stylesheet", nonceLinkStyle); var xsrf = antiForgery.GetAndStoreTokens(HttpContext); var requestToken = xsrf.RequestToken; // The XSRF-Tokens are passed to the client through cookies, since we always want the most up-to-date cookies across all tabs Response.Cookies.Append("XSRF-RequestToken", requestToken ?? "", new CookieOptions() { HttpOnly = false, IsEssential = true, Secure = true, SameSite = SameSiteMode.Strict }); } @Html.Raw(source) Anti-forgery protection

Cookies are used to store the session authentication. The authentication cookie is a HTTP only secure cookie only for its domain. Browser Same Site protection helps secure the session. Old browsers do not support Same Site and Anti-forgery protection is still required. You can add this protection in two ways. I use a CSRF anti-forgery cookie. You could also use custom headers with validation. The getCookie script gets the anti-forgery cookie which was created by the server. This cookie is not HTTP only because it needs to be read into the UI.

export const getCookie = (cookieName: string) => { const name = `${cookieName}=`; const decodedCookie = decodeURIComponent(document.cookie); const ca = decodedCookie.split(";"); for (let i = 0; i < ca.length; i += 1) { let c = ca[i]; while (c.charAt(0) === " ") { c = c.substring(1); } if (c.indexOf(name) === 0) { return c.substring(name.length, c.length); } } return ""; };

The Anti-forgery header is added to every API call which requires this. I use axios to request API data, and the header needs to be added to the axiosConfig. For the demo, I just implemented this directly the Vue js component. The component makes various API calls.

<script setup lang="ts"> import ResultsDisplay from './ResultsDisplay.vue' import axios from 'axios'; import { ref, onMounted } from 'vue' import { getCookie } from '../getCookie'; const isLoggedIn = ref<boolean>() const currentUser = ref<any>() const jsonResponse = ref<any>() onMounted(() => { getUserProfile() }) const axiosConfig = { headers:{ 'X-XSRF-TOKEN': getCookie('XSRF-RequestToken'), } }; // request.headers.set('X-XSRF-TOKEN', getCookie('XSRF-RequestToken')); function getDirectApi() { axios.get(`${getCurrentHost()}/api/DirectApi`, axiosConfig) .then((response: any) => { jsonResponse.value = response.data; return response.data; }) .catch((error: any) => { alert(error); }); } function getUserProfile() { axios.get(`${getCurrentHost()}/api/User`) .then((response: any) => { console.log(response); jsonResponse.value = response.data; if(response.data.isAuthenticated){ isLoggedIn.value = true; currentUser.value = response.data.claims[0].value } return response.data; }) .catch((error: any) => { alert(error); }); } function getCurrentHost() { const host = window.location.host; const url = `${window.location.protocol}//${host}`; return url; } </script> <template> <div class='home'> <a class="btn" href="api/Account/Login" v-if='!isLoggedIn'>Log in</a> <div v-if='isLoggedIn'> <form method="post" action="api/Account/Logout"> <button class="btn btn-link" type="submit">Sign out</button> </form> </div> <button v-if='isLoggedIn' class='btn' @click='getUserProfile' >Get Profile data</button> <button v-if='isLoggedIn' class='btn' @click='getDirectApi' >Get API data</button> <ResultsDisplay v-if='isLoggedIn' v-bind:currentUser='currentUser' v-bind:jsonResponse='jsonResponse' /> </div> <p class="read-the-docs">BFF using ASP.NET Core and Vue.js</p> </template> <style scoped> .read-the-docs { color: #888; } </style>

Setup ASP.NET Core application

The ASP.NET Core project is setup to host the static html file from Vue.js and respond to all HTTP requests as defined using the APIs. The nonce is added to the index.html file. The standard ASP.NET Core OpenID Connect handler is used to authenticate the user and the application. The session is stored in a cookie. The NetEscapades.AspNetCore.SecurityHeaders Nuget package is used to add the security headers and the CSP.

using BffOidc.Server; using BffOidc.Server.Services; using Microsoft.AspNetCore.Authentication.Cookies; using Microsoft.AspNetCore.Authentication.OpenIdConnect; using Microsoft.AspNetCore.Mvc; using Microsoft.IdentityModel.JsonWebTokens; using Microsoft.IdentityModel.Logging; using Microsoft.IdentityModel.Protocols.OpenIdConnect; using Microsoft.IdentityModel.Tokens; using NetEscapades.AspNetCore.SecurityHeaders.Infrastructure; var builder = WebApplication.CreateBuilder(args); builder.WebHost.ConfigureKestrel(serverOptions => { serverOptions.AddServerHeader = false; }); var services = builder.Services; var configuration = builder.Configuration; services.AddSecurityHeaderPolicies() .SetPolicySelector((PolicySelectorContext ctx) => { return SecurityHeadersDefinitions.GetHeaderPolicyCollection(builder.Environment.IsDevelopment(), configuration["OpenIDConnectSettings:Authority"]); }); services.AddAntiforgery(options => { options.HeaderName = "X-XSRF-TOKEN"; options.Cookie.Name = "__Host-X-XSRF-TOKEN"; options.Cookie.SameSite = SameSiteMode.Strict; options.Cookie.SecurePolicy = CookieSecurePolicy.Always; }); services.AddHttpClient(); services.AddOptions(); services.AddAuthentication(options => { options.DefaultScheme = CookieAuthenticationDefaults.AuthenticationScheme; options.DefaultChallengeScheme = OpenIdConnectDefaults.AuthenticationScheme; }) .AddCookie() .AddOpenIdConnect(options => { var oidcConfig = builder.Configuration.GetSection("OpenIDConnectSettings"); options.Authority = oidcConfig["Authority"]; options.ClientId = oidcConfig["ClientId"]; options.ClientSecret = oidcConfig["ClientSecret"]; options.SignInScheme = CookieAuthenticationDefaults.AuthenticationScheme; options.ResponseType = OpenIdConnectResponseType.Code; options.SaveTokens = true; options.GetClaimsFromUserInfoEndpoint = true; options.MapInboundClaims = false; options.TokenValidationParameters.NameClaimType = JwtRegisteredClaimNames.Name; }); services.AddControllersWithViews(options => options.Filters.Add(new AutoValidateAntiforgeryTokenAttribute())); services.AddRazorPages().AddMvcOptions(options => { //var policy = new AuthorizationPolicyBuilder() // .RequireAuthenticatedUser() // .Build(); //options.Filters.Add(new AuthorizeFilter(policy)); }); builder.Services.AddReverseProxy() .LoadFromConfig(builder.Configuration.GetSection("ReverseProxy")); var app = builder.Build(); //IdentityModelEventSource.ShowPII = true; JsonWebTokenHandler.DefaultInboundClaimTypeMap.Clear(); if (app.Environment.IsDevelopment()) { app.UseDeveloperExceptionPage(); app.UseWebAssemblyDebugging(); } else { app.UseExceptionHandler("/Error"); } app.UseSecurityHeaders(); app.UseHttpsRedirection(); app.UseStaticFiles(); app.UseRouting(); app.UseNoUnauthorizedRedirect("/api"); app.UseAuthorization(); app.MapRazorPages(); app.MapControllers(); app.MapNotFound("/api/{**segment}"); if (app.Environment.IsDevelopment()) { var uiDevServer = app.Configuration.GetValue<string>("UiDevServerUrl"); if (!string.IsNullOrEmpty(uiDevServer)) { app.MapReverseProxy(); } } app.MapFallbackToPage("/_Host"); app.Run();

Development environment

Developers require a professional development setup and should use the technical stacks like the creators of the technical stacks recommend. Default development environments is the aim and always the easiest to maintain. The Vue.js project uses a default vite environment or best practices as the Vue.js community recommends. The server part of the application must proxy all UI requests to the Vue.js development environment. I use Microsoft YARP reverse proxy to implement this. This is only required for development in this setup. Some flavors of the BFF use proxies in the production environments as well.

Testing and running

The appsettings.json MUST be updated with your OpenIddict server configuration values. If using a client secret, store this in the user secrets for development, or in a key vault when deployed to Azure.

"OpenIDConnectSettings": { "Authority": "https://localhost:44318", "ClientId": "oidc-pkce-confidential", "ClientSecret": "oidc-pkce-confidential_secret" }, Debugging

Start the Vue.js project from the ui folder

npm start

Start the ASP.NET Core project from the server folder

dotnet run

When the localhost of the server app is opened, you can authenticate and use.

Notes

I was not able to apply the nonce to the dev environment styles of the Vue.js part. This would be of great benefit as you can prevent insecure styles in development and not discover these problems after a deployment. In the production build, the nonce is applied correctly.

Links:

https://vuejs.org/

https://vitejs.dev/

https://github.com/vuejs/create-vue

https://documentation.openiddict.com/

https://www.koderhq.com/tutorial/vue/vite/

https://github.com/damienbod/bff-aspnetcore-angular

https://github.com/damienbod/bff-openiddict-aspnetcore-angular

https://github.com/damienbod/bff-aspnetcore-vuejs

Sunday, 17. November 2024

IdM Laboratory

続々々)リンク可能性、リンク不可能性の話

こんにちは、富士榮です。 引き続きWalletモデルを考える時のクレデンシャルのリンクの件についてみていきましょう。 元記事はSpruce IDのWayneのこの記事です。 https://blog.spruceid.com/provably-forgotten-signatures-adding-privacy-to-digital-identity/ なんだかんだで長くなってしまいましたが、ようやく中身の話です。 前回、RP同士の結託の話をしましたが、課題についてこのように続きます。 To maximize privacy, these pieces of data presented using a VDC should be “unlinkable.” For instance, if the same user who’d pr
こんにちは、富士榮です。
引き続きWalletモデルを考える時のクレデンシャルのリンクの件についてみていきましょう。
元記事はSpruce IDのWayneのこの記事です。 https://blog.spruceid.com/provably-forgotten-signatures-adding-privacy-to-digital-identity/
なんだかんだで長くなってしまいましたが、ようやく中身の話です。


前回、RP同士の結託の話をしましたが、課題についてこのように続きます。
To maximize privacy, these pieces of data presented using a VDC should be “unlinkable.” For instance, if the same user who’d proven their age at a content website later went to a bank and proved their name, no one should be able to connect those two data points to the same ID holder, not even if the content website and the bank work together. We wouldn’t want the bank to make unfair financial credit decisions based on the perceived web browsing habits of the user.

プライバシーを最大化するために、VDCを使って提示されるこれらのデータは 「リンク不可能 」であるべきだ。例えば、コンテンツ・ウェブサイトで年齢を証明した同じユーザーが、その後銀行に行って名前を証明した場合、コンテンツ・ウェブサイトと銀行が連携していたとしても、この2つのデータを同じID保持者に結びつけることは誰にもできないはずだ。たとえコンテンツサイトと銀行が連携していたとしてもだ。私たちは、銀行がユーザーのウェブ閲覧の習慣に基づいて、不公正な金融上の信用判断を下すことを望まない。

However, VDCs are sometimes built on a single digital signature, a unique value that can be used to track or collate information about a user if shared repeatedly with one or more parties. If the content website in our example retains the single digital signature created by the issuing authority, and that same digital signature was also shared with the bank, then the content website and the bank could collude to discover more information about the user than what was intended.

しかし、VDCは、1つまたは複数の当事者と繰り返し共有された場合、ユーザーに関する情報を追跡または照合するために使用することができる一意の値である単一のデジタル署名に基づいて構築されることがあります。この例のコンテンツ・ウェブサイトが、発行機関によって作成された単一のデジタル署名を保持し、その同じデジタル署名が銀行とも共有された場合、コンテンツ・ウェブサイトと銀行が結託して、意図された以上のユーザーに関する情報を発見する可能性がある。

The case where two or more verifiers of information can collude to learn more about the user is known as verifier-verifier collusion and can violate user privacy. While a name-age combination may seem innocuous, a third-party data collector could, over time, assemble a variety of data about a user simply by tracking their usage of unique values across many different verifiers, whether online or in-person. At scale, these issues can compound into dystopian surveillance schemes by allowing every digital interaction to be tracked and made available to the highest bidders or an unchecked central authority.

2つ以上の情報検証者が結託してユーザーについてより多くのことを知ることができるケースは、検証者と検証者の結託として知られ、ユーザーのプライバシーを侵害する可能性がある。名前と年齢の組み合わせは無害に見えるかもしれないが、第三者のデータ収集者は、オンラインであれ対面であれ、多くの異なる検証者間でユニークな値の使用状況を追跡するだけで、時間の経過とともに、ユーザーに関するさまざまなデータを組み立てることができる。規模が大きくなれば、このような問題は、あらゆるデジタル交流が追跡され、最も高い入札者やチェックされていない中央当局が利用できるようになることで、ディストピア的な監視スキームへと複雑化する可能性がある。

書いてあることとしては、フェデレーションモデルでも課題となっていた一つまたは複数の事業者への複数回ID情報を同一の識別子で渡す場合に意図しない属性のLookupができてしまう可能性がある、ということです。Verifiable Credentials(最近のNISTの資料ではVerifiable Digital Credentials/VDCと記載されていることが多い)だとデジタル署名が共通であることで同じことが起きてしまう、ということについて問題視されています。

まさに上記の図に記載されていることですね。

で、どうするの?という話ですが、フェデレーションの時と同じようにPairwiseな値をVerifier単位で渡してあげるということです。


Fortunately, a simple solution exists to help prevent verifier-verifier collusion by cycling digital signatures so that each is used only once. When a new VDC is issued by a post office, DMV, or other issuer, it can be provisioned not with a single signature from the issuing authority that produces linkable usage, but with many different signatures from the issuing authority. If user device keys are necessary for using the VDC, as in the case of mobile driver’s licenses, several different keys can be used as well. A properly configured digital wallet would then use a fresh signature (and potentially a fresh key) every time an ID holder uses their VDC to attest to particular pieces of information, ideally preventing linkage to the user through the signatures.

幸いなことに、デジタル署名を循環させ、各署名が一度しか使用されないようにすることで、 検証者と検証者の癒着を防ぐシンプルなソリューションが存在する。新しいVDCが郵便局、陸運局、またはその他の発行者から発行される場合、リンク可能な使用法を生成する発行機関の単一の署名ではなく、発行機関の多くの異なる署名でプロビジョニングすることができる。モバイル運転免許証の場合のように、VDCを使用するためにユーザーデバイスキーが必要な場合、複数の異なるキーを使用することもできる。適切に構成されたデジタル・ウォレットは、ID保有者がVDCを使用して特定の情報を証明するたびに、新しい署名(および潜在的に新しい鍵)を使用し、理想的には署名を通じてユーザーへのリンクを防止する。

Using our earlier example of a user who goes to a content website and uses their VDC to prove they are over 18, the digital wallet presents a signature for this interaction, and doesn’t use that signature again. When the user then visits their bank and uses a VDC to prove their name for account verification purposes, the digital wallet uses a new signature for that interaction.

コンテンツ・ウェブサイトにアクセスし、18歳以上であることを証明するためにVDCを使用するユーザーを例にとると、デジタルウォレットはこのインタラクションのために署名を提示し、その署名を再度使用することはありません。その後、ユーザが銀行を訪れ、口座確認のために自分の名前を証明するためにVDCを使用すると、デジタルウォレットはそのインタラクションのために新しい署名を使用します。

Because the signatures are different across each presentation, the content website and the bank cannot collude to link these two interactions back to the same user without additional information. The user can even use different signatures every time they visit the same content website, so that the content website cannot even tell how often the user visits from repeated use of their digital ID.

署名は各プレゼンテーションで異なるため、コンテンツ・ウェブサイトと銀行は結託して、追加情報なしにこれら2つのインタラクションを同じユーザーに結びつけることはできない。ユーザーは、同じコンテンツ・ウェブサイトを訪問するたびに異なる署名を使用することもできるため、コンテンツ・ウェブサイトは、デジタルIDの繰り返し使用からユーザーの訪問頻度を知ることさえできない。 


要はキーローテーションで鍵が一度しか使われないようにすればいいじゃん、って話ですね。まぁ、Transientにするにはこの方法でも良さそうですが、結構実装コスト高そうです。過去の署名に使った公開鍵をひたすら公開し続ける(もしくはクレデンシャル自体に含める)ことになるような気もしますし、同一VerifierでPersistentにすることができません。もう少し工夫も必要になりそうな気がします。


次回はIssuer/Verifierの結託のケースをみていきましょう。

 


Simon Willison

LLM 0.18

LLM 0.18 New release of LLM. The big new feature is asynchronous model support - you can now use supported models in async Python code like this: import llm model = llm.get_async_model("gpt-4o") async for chunk in model.prompt( "Five surprising names for a pet pelican" ): print(chunk, end="", flush=True) Also new in this release: support for sending audio attachments to OpenAI's gpt

LLM 0.18

New release of LLM. The big new feature is asynchronous model support - you can now use supported models in async Python code like this:

import llm model = llm.get_async_model("gpt-4o") async for chunk in model.prompt( "Five surprising names for a pet pelican" ): print(chunk, end="", flush=True)

Also new in this release: support for sending audio attachments to OpenAI's gpt-4o-audio-preview model.

Tags: async, llm, python, generative-ai, projects, ai, llms


Ben Werdmüller

Elon Musk algorithmically boosted Republican accounts on X from the moment he endorsed Trump

[Timothy Graham and Mark Andrejevic] Elon Musk didn't just endorse Trump with his words - according to this pre-print research paper, he gave Republicans an algorithmic boost on X, too: "The analysis reveals a structural engagement shift around mid-July 2024, suggesting platform-level changes that influenced engagement metrics for all accounts under examination. The date at

[Timothy Graham and Mark Andrejevic]

Elon Musk didn't just endorse Trump with his words - according to this pre-print research paper, he gave Republicans an algorithmic boost on X, too:

"The analysis reveals a structural engagement shift around mid-July 2024, suggesting platform-level changes that influenced engagement metrics for all accounts under examination. The date at which the structural break (spike) in engagement occurs coincides with Elon Musk’s formal endorsement of Donald Trump on 13th July 2024."

Despite big words about "free speech", Musk seems to be hell-bent on using the platform he acquired as a megaphone for his own interests, in the same way that Rupert Murdoch has used Fox News. To me, this points to the need for media regulation, and for anyone using the platform to approach it with caution. It's not an even playing field - not even close.

#Democracy

[Link]

Saturday, 16. November 2024

IdM Laboratory

続々)リンク可能性、リンク不可能性の話

こんにちは、富士榮です。 引き続きWalletモデルを考える時のクレデンシャルのリンクの件についてみていきましょう。 元記事はSpruce IDのWayneのこの記事です。 https://blog.spruceid.com/provably-forgotten-signatures-adding-privacy-to-digital-identity/ これはFederationモデルとも当然共通ですが、リンク可能性には大きく2つのパターンがあります。一つはRP(Verifier)同士が結託するパターン、もう一つはIdP(Issuer)とRP(Verifier)が結託するパターンです。 まずはRP(Verifier)同士が結託するパターンです。 One goal for a verifiable digital credential syst
こんにちは、富士榮です。
引き続きWalletモデルを考える時のクレデンシャルのリンクの件についてみていきましょう。
元記事はSpruce IDのWayneのこの記事です。 https://blog.spruceid.com/provably-forgotten-signatures-adding-privacy-to-digital-identity/

これはFederationモデルとも当然共通ですが、リンク可能性には大きく2つのパターンがあります。一つはRP(Verifier)同士が結託するパターン、もう一つはIdP(Issuer)とRP(Verifier)が結託するパターンです。
まずはRP(Verifier)同士が結託するパターンです。
One goal for a verifiable digital credential system is that a credential can be used to present only the necessary facts in a particular situation, and nothing more. For instance, a VDC could prove to an age-restricted content website that someone is over a certain age, without revealing their address, date of birth, or full name. This ability to limit disclosures allows the use of functional identity, and it’s one big privacy advantage of a VDC system over today’s identity systems that store a complete scan of a passport or driver’s license. However, even with selective disclosure of data fields, it is possible to unintentionally have those presentations linkable if the same unique values are used across verifiers.

検証可能なデジタル資格情報システムにおける目標のひとつは、特定の状況において、資格情報は必要な事実のみを提示し、それ以上の提示は行わないというものです。例えば、年齢制限のあるコンテンツウェブサイトに対して、VDCは、住所、生年月日、フルネームを明らかにすることなく、ある人が一定の年齢以上であることを証明することができます。この開示を制限する能力により、機能的なIDの利用が可能となり、パスポートや運転免許証の完全なスキャン情報を保存する現在のIDシステムに比べ、VDCシステムのプライバシー保護の面で大きな利点となります。しかし、データフィールドを選択的に開示する場合でも、検証者間で同じ固有の値が使用されていると、意図せずにそれらの提示がリンク可能となる可能性があります。

In our example, if a user proves their age to access an age-restricted content website (henceforth referred to simply as “content website”), and then later verifies their name at a bank, both interactions may run the risk of revealing more information than the user wanted if the content website and bank colluded by comparing common data elements they received. Although a check for “over 18 years old” and a name don’t have any apparent overlap, there are technical implementation details such as digital signatures and signing keys that, when reused across interactions, can create a smoking gun.

この例では、ユーザーが年齢制限のあるコンテンツ・ウェブサイト(以後、単に「コンテンツ・ウェブサイト」と呼ぶ)にアクセスするために年齢を証明し、その後、銀行で名前を確認した場合、コンテンツ・ウェブサイトと銀行が受け取った共通のデータ要素を比較することで結託すれば、両方のインタラクションで、ユーザーが望んだ以上の情報が明らかになる危険性がある。18歳以上」のチェックと名前には明らかな重複はないが、デジタル署名や署名キーのような技術的な実装の詳細があり、それが相互作用を超えて再利用されると、決定的な証拠を作り出す可能性がある。

Notably, the same digital signature is uniquely distinguishable, and also new signatures made from the same user key can be correlated. This can all work against the user to reveal more information than intended.

特筆すべきは、同じデジタル署名は一意に区別され、また同じユーザー・キーから作成された新しい署名は相関することができることである。これはすべてユーザーに不利に働き、意図した以上の情報を明らかにする可能性がある。

これまではPairwiseによる識別子の紐付けを中心に考えてきていたわけですが、デジタル署名による紐付けが問題になってきています。


こういうことですね。

数年前にIIWで初めてSD-JWTの話を聞いた時に、この質問をTorstenにしてみたんですが、まだその段階ではリンク可能性についてはそこまで大きなトピックスになっていませんでした。まずは選択的情報開示がちゃんとできるようにならないといけないよね、という。

ようやくここまで議論がすすんだなぁ、というところです。


次回もこの辺りを引き続き深掘りしていきます。


 

 

 

 

 



Simon Willison

Project: Civic Band - scraping and searching PDF meeting minutes from hundreds of municipalities

I interviewed Philip James about Civic Band, his "slowly growing collection of databases of the minutes from civic governments". Philip demonstrated the site and talked through his pipeline for scraping and indexing meeting minutes from many different local government authorities around the USA. We recorded this conversation as part of yesterday's Datasette Public Office Hours session.

I interviewed Philip James about Civic Band, his "slowly growing collection of databases of the minutes from civic governments". Philip demonstrated the site and talked through his pipeline for scraping and indexing meeting minutes from many different local government authorities around the USA.

We recorded this conversation as part of yesterday's Datasette Public Office Hours session.

Civic Band The technical stack Scale and storage Future plans Civic Band

Philip was inspired to start thinking more about local government after the 2016 US election. He realised that there was a huge amount of information about decisions made by local authorities tucked away in their meeting minutes,but that information was hidden away in thousands of PDF files across many different websites.

There was this massive backlog of basically every decision that had ever been made by one of these bodies. But it was almost impossible to discover because it lives in these systems where the method of exchange is a PDF.

Philip lives in Alameda, which makes its minutes available via this portal powered by Legistar. It turns out there are a small number of vendors that provide this kind of software tool, so once you've written a scraper for one it's likely to work for many others as well.

Here's the Civic Band portal for Alameda, powered by Datasette.

It's running the datasette-search-all plugin and has both tables configured for full-text search. Here's a search for housing:

The technical stack

The public Civic Band sites all run using Datasette in Docker Containers - one container per municipality. They're hosted on a single Hetzner machine.

The ingestion pipeline runs separately from the main hosting environment, using a Mac Mini on Philp's desk at home.

OCR works by breaking each PDF up into images and then running Tesseract OCR against them directly on the Mac Mini. This processes in the order of 10,000 or less new pages of documents a day.

Philip treats PDF as a normalization target, because the pipeline is designed around documents with pages of text. In the rare event that a municipality publishes documents in another format such as .docx he converts them to PDF before processing.

PNG images of the PDF pages are served via a CDN, and the OCRd text is written to SQLite database files - one per municipality. SQLite FTS provides full-text search.

Scale and storage

The entire project currently comes to about 265GB on disk. The PNGs of the pages use about 350GB of CDN storage.

Most of the individual SQLite databases are very small. The largest is for Maui County which is around 535MB because that county has professional stenographers taking detailed notes for every one of their meetings.

Each city adds only a few documents a week so growth is manageable even as the number of cities grows.

Future plans

We talked quite a bit about a goal to allow users to subscribe to updates that match specific search terms.

Philip has been building out a separate site called Civic Observer to address this need, which will store searches and then execute the periodically using the Datasette JSON API, with a Django app to record state to avoid sending the same alert more than once.

I've had a long term ambition to build some kind of saved search alerts plugin for Datasette generally, to allow users to subscribe to new results for arbitrary SQL queries. My sqlite-chronicle library is part or that effort - it uses SQLite triggers to maintain version numbers for individual rows in a table, allowing you to query just the rows that have been inserted or modified since the version number last time you ran the query.

Philip is keen to talk to anyone who is interested in using Civic Band or helping expand it to even more cities. You can find him on the Datasette Discord.

Tags: data-journalism, political-hacking, politics, sqlite, datasette


Doc Searls Weblog

The Redstream Media

To identify the microphones in that river, here’s Apple: That river began with the copious red pissings of Rush Limbaugh. Now eight Rushians comprise most of the News Commentary flow on Apple Podcasts, and much of the whole podcast watershed as well. (None are so skilled as Rush, but that’s another story.) It’s not much different […]

To identify the microphones in that river, here’s Apple:

Two of these are not like the others. Just two. And one of those faces calls himself a “raging moderate.”

That river began with the copious red pissings of Rush Limbaugh. Now eight Rushians comprise most of the News Commentary flow on Apple Podcasts, and much of the whole podcast watershed as well. (None are so skilled as Rush, but that’s another story.)

It’s not much different on Spotify:

Six out of the top ten.

And, of course, there’s Fox News (and hell, all of News Corp), and Elon Musk’s X .

As Michael Tomasky puts it in The New Republic,

Once upon a time, the mainstream media was a beachball, and the right-wing media was a golf ball. Today, the mainstream media (what with layoffs and closures and the near death of serious local news reporting) is the size of a volleyball, and the right-wing media is the size of a basketball, which, in case you’re wondering, is bigger.

In case that doesn’t hammer the point home hard enough, he adds this:

Today, the right-wing media—Fox News (and the entire News Corp.), Newsmax, One America News Network, the Sinclair network of radio and TV stations and newspapers, iHeart Media (formerly Clear Channel), the Bott Radio Network (Christian radio), Elon Musk’s X, the huge podcasts like Joe Rogan’s, and much more—sets the news agenda in this country. And they fed their audiences a diet of slanted and distorted information that made it possible for Trump to win.

Let me say that again, in case it got lost: Today, the right-wing media sets the news agenda in this country. Not The New York Times. Not The Washington Post (which bent over backwards to exert no influence when Jeff Bezos pulled the paper’s Harris endorsement). Not CBS, NBC, and ABC. The agenda is set by all the outlets I listed in the above paragraph. Even the mighty New York Times follows in its wake, aping the tone they set disturbingly often.

And he barely mentions podcasting.

According to Newsweek, Joe Rogan‘s interview with Donald Trump was viewed over 26 million times in the 24 hours after it went up. And that was just the video. Rogan has 14.5 million followers on Spotify alone for his all-audio podcast.

We can see the reason why Spotify paid $200 million for Joe Rogan back in 2022. Hell, you could buy the whole AM band in New York for less than that today. Wait a few years and the FM band will be on the bargain shelf as well, because radio is being eaten alive on the talk side by podcasting and on the music side by streaming. (EMarketer has lots of numbers here.)

Fox owns what’s left of news on cable. Since Trump won the presidency, viewing of MSNBC has tanked, while Fox News continues to grow. (And stars of its shows are named to Trump’s new cabinet positions. It’s a system, folks.)

Scott Galloway, the raging moderate, expands on one part of this in” The Podcast Election. An excerpt (from both post and podcast):

New(er) Media 

New forms of media periodically reshape our culture and politics. FDR mastered radio, JFK leveraged TV, and Reagan nailed cable news. Obama energized young voters via the internet. Trump hijacked the world’s attention on Twitter. This year it was podcasting. The three biggest media events of this fall were the debate and Harris and Trump’s respective appearances on Call Her Daddy and The Joe Rogan Experience.

Almost half of adult Americans, 136 million people, listen to at least one podcast a month. The global audience is now 505 million, a quarter of the internet’s reach. When Trump went on Joe Rogan, Lex Fridman, and This Past Weekend w/Theo Von, he was embracing the manosphere and riding a tectonic shift in media: The most efficient way to reach the largest and most persuadable audience (i.e., young men) is via podcast. Nothing comes close.

Reach and Focus

 

Rogan has 16 million Spotify subscribers and can reach many more people across a variety of other platforms: In just three days after the live podcast, his three-hour-long conversation with Trump was viewed 40 million times on YouTube. The audio downloads likely exceeded 15 million. There will be a lot of second-guessing re what the Harris campaign should have done. Getting on a plane to Austin to visit Rogan would have been a layup.

Obviously, this is a very big thing: so big that we need a label for the sum of all these popular right-wing news and commentary outlets: one we can pose against what’s still called “mainstream,” but isn’t.

I suggest redstream media.

Spread the words.

[Later…18 November] Influencers too.

 

 

 


Simon Willison

NuExtract 1.5

NuExtract 1.5 Structured extraction - where an LLM helps turn unstructured text (or image content) into structured data - remains one of the most directly useful applications of LLMs. NuExtract is a family of small models directly trained for this purpose (though text only at the moment) and released under the MIT license. It comes in a variety of shapes and sizes: NuExtract-v1.5 is a 3.8

NuExtract 1.5

Structured extraction - where an LLM helps turn unstructured text (or image content) into structured data - remains one of the most directly useful applications of LLMs.

NuExtract is a family of small models directly trained for this purpose (though text only at the moment) and released under the MIT license.

It comes in a variety of shapes and sizes:

NuExtract-v1.5 is a 3.8B parameter model fine-tuned on Phi-3.5-mini instruct. You can try this one out in this playground. NuExtract-tiny-v1.5 is 494M parameters, fine-tuned on Qwen2.5-0.5B. NuExtract-1.5-smol is 1.7B parameters, fine-tuned on SmolLM2-1.7B.

All three models were fine-tuned on NuMind's "private high-quality dataset". It's interesting to see a model family that uses one fine-tuning set against three completely different base models.

Useful tip from Steffen Röcker:

Make sure to use it with low temperature, I've uploaded NuExtract-tiny-v1.5 to Ollama and set it to 0. With the Ollama default of 0.7 it started repeating the input text. It works really well despite being so smol.

Tags: llms, ai, generative-ai, fine-tuning, phi, qwen, hugging-face


Ben Werdmüller

Escape from Twitter: The Future of Social Media Is Decentralized

This is a pretty great article about the decentralized social web, which quotes Christine Lemmer-Webber, Blaine Cook, and me. It’s in Polish, but if you don’t speak the language, the “translate” button on your browser works pretty well. Here are the full remarks I sent Michał “rysiek” Woźniak, the author of the piece: Social media is where people learn about the world: they discover the news,

This is a pretty great article about the decentralized social web, which quotes Christine Lemmer-Webber, Blaine Cook, and me.

It’s in Polish, but if you don’t speak the language, the “translate” button on your browser works pretty well.

Here are the full remarks I sent Michał “rysiek” Woźniak, the author of the piece:

Social media is where people learn about the world: they discover the news, connect with each other, share the things they love and what's happening around them. We learn about art and love; about current events; and sometimes, about injustice and war — all at a global scale.

The owners of these spaces have the power to influence the global conversation to fit their business needs. Business model changes at every centralized social media company have made it harder to reach your community, but it goes beyond that. We recently saw the owner of X heavily weigh in on the US election. Previously, lapses at Facebook helped lead to genocide in Myanmar. These spaces are too important to be privately owned or to be subject to any single owner's needs or whims.

Decentralized social media divests ownership back to the people. Federated social networks are co-operatives of small communities, each with their own ownership and their own rules. Fully decentralized social networks allow users to make their own choices about how their content is moderated and presented to them. There is never a single owner who can unilaterally change the conversation; the platform is owned by everybody, just as the web itself is owned by everybody.

In answer to a question about my employer, ProPublica, its involvement in the Fediverse, and advice I might have for other publishers, I wrote:

ProPublica was already on the fediverse before I got there. That's down to Chris Morran, a member of the audience team. But, of course, I've been a strong advocate.

My main advice is: be everywhere your audience is. That does mean Mastodon and Bluesky - and we've had strong engagement on both. Use your own domain to validate your accounts and encourage your staff to join individually. By using cutting edge social media platforms and not being afraid to experiment early, ProPublica has so far bucked the downward trends that have been seen at other publications.

You can read the whole piece here.


IdM Laboratory

続)リンク可能性、リンク不可能性の話

こんにちは、富士榮です。 デジタルIDウォレット時代にもアイデンティティの名寄せ・紐付けによるコンテキストを超えた属性情報の意図せぬ開示によるプライバシー問題については無くなることはありません。 デジタルクレデンシャルに関するリンク可能性については各所で議論されており、早く解決されないと本格的な社会実装を行う上で大きな障壁となると思います。 ということで、昨日紹介したWayneの資料を読んで行きましょう。 https://blog.spruceid.com/provably-forgotten-signatures-adding-privacy-to-digital-identity/ ちなみにこのネタ、9月にNISTでもプレゼンしたみたいです。 https://csrc.nist.gov/Presentations/2024/wpec2024-3b4
こんにちは、富士榮です。
デジタルIDウォレット時代にもアイデンティティの名寄せ・紐付けによるコンテキストを超えた属性情報の意図せぬ開示によるプライバシー問題については無くなることはありません。 デジタルクレデンシャルに関するリンク可能性については各所で議論されており、早く解決されないと本格的な社会実装を行う上で大きな障壁となると思います。


ということで、昨日紹介したWayneの資料を読んで行きましょう。 https://blog.spruceid.com/provably-forgotten-signatures-adding-privacy-to-digital-identity/
ちなみにこのネタ、9月にNISTでもプレゼンしたみたいです。 https://csrc.nist.gov/Presentations/2024/wpec2024-3b4
EUでもウォレットの管理とリンカビリティの問題が結構話題になっているようなので、この領域は実際に国が管理や認定するウォレットを使って行こうとすると課題になるんでしょうね。
ということで中身をかいつまんで。
まず背景と課題提起の部分です。
Developing models to implement this VDC future requires carefully thinking through every risk of the new model–including risks in the future. One of the edge-case risks privacy researchers have identified is sometimes known as “linkability.”

このVDCの将来を実現するためのモデルを開発するには、新しいモデルのあらゆるリスクを慎重に検討する必要があります。プライバシー研究者が特定したエッジケースのリスクの1つは、時として「リンク可能性」として知られています。

ちなみにVDCはVerifiable Digital Credentialsの略です。VCとかmDLとかですね。

 

Linkability refers to the possibility of profiling people by collating data from their use of digital credentials. This risk commonly arises when traceable digital signatures or identifiers are used repeatedly, allowing different parties to correlate many interactions back to the same individual, thus compromising privacy. This can create surveillance potential across societies, whether conducted by the private sector, state actors, or even foreign adversaries.

リンク可能性とは、デジタル認証の利用に関するデータを照合することで、人々をプロファイリングできる可能性を指します。このリスクは一般的に、追跡可能なデジタル署名や識別子が繰り返し使用される場合に発生し、さまざまな当事者が多くのやりとりを同一人物に相関させることを可能にし、プライバシーを侵害します。これは、民間部門、国家、さらには外国の敵対者によって実施されるかどうかに関わらず、社会全体にわたって監視の可能性を生み出す可能性があります。

リンク可能性そのものについての説明です。前回書いたように識別子による名寄せの課題はこれまでもありましたが、デジタル署名についても明記されていますね。今回の話のキモは署名による名寄せですね(後述)。まぁ、名寄せできる=リンクできてしまう、という話でプライバシーリスクにつながるって話です。 


In this work, we explore an approach that adds privacy by upgrading existing systems to prevent linkability (or “correlation”) and instead of overhauling them entirely. It aims to be compatible with already-deployed implementations of digital credential standards such as ISO/IEC 18013-5 mDL, SD-JWT, and W3C Verifiable Credentials, while also aligning with cryptographic security standards such as FIPS 140-2/3. It is compatible with and can even pave the way for future privacy technologies such as post-quantum cryptography (PQC) or zero-knowledge proofs (ZKPs) while unlocking beneficial use cases today. 

今回の研究では、既存のシステムを全面的に再構築するのではなく、リンク可能性(または「相関性」)を防止するためにアップグレードすることでプライバシーを追加するアプローチを模索しています。ISO/IEC 18013-5 mDL、SD-JWT、W3C Verifiable Credentials などのデジタル認証基準の実装と互換性を保ちつつ、FIPS 140-2/3 などの暗号化セキュリティ基準にも適合することを目指しています。また、ポスト量子暗号(PQC)やゼロ知識証明(ZKPs)などの将来のプライバシー技術との互換性があり、それらの技術への道筋をつけることさえ可能です。同時に、今日有益なユースケースの鍵を開けることにもなります。 

PQCやZKPをうまく使ってこの問題を解けないか?というのはIIWでも語られていたことなので、やはり注目を集めている分野なのかと。


前回も仮名の話を紹介しましたが、この話はSAMLの時代からずっとあった話なのですが、なぜ今、改めて課題として取り上げられているのか?について以下のように触れられています。

Governments are rapidly implementing digital identity programs. In the US, 13 states already have live mobile driver’s license (mDL) programs, with over 30 states considering them, and growing. Earlier this year, the EU has approved a digital wallet framework which will mandate live digital wallets across its member states by 2026. This is continuing the momentum of the last generation of digital identity programs with remarkable uptake, such as India’s Aadhaar which is used by over 1.3 billion people. However, it is not clear that these frameworks plan for guarantees like unlinkability in the base technology, yet the adoption momentum increases.

各国政府はデジタルIDプログラムを急速に導入している。米国では、すでに13の州がライブ・モバイル運転免許証(mDL)プログラムを導入しており、30以上の州が検討中で、その数は増加している。今年初め、EUは2026年までに加盟国全体でライブ・デジタル・ウォレットを義務付けるデジタル・ウォレット枠組みを承認した。これは、13億人以上に利用されているインドのAadhaarのような、顕著な普及を遂げた前世代のデジタルIDプログラムの勢いを引き継いでいる。しかし、これらのフレームワークが、ベース技術におけるリンク不能性のような保証を計画 していることは明らかではないが、それでも採用の勢いは増している。 

Some think that progress on digital identity programs should stop entirely until perfect privacy is solved. However, that train has long left the station, and calls to dismantle what already exists, has sunk costs, and seems to function may fall on deaf ears. There are indeed incentives for the momentum to continue: demands for convenient online access to government services or new security systems that can curb the tide of AI-generated fraud. Also, it’s not clear that the best approach is to design the “perfect” system upfront, without the benefit of iterative learning from real-world deployments.

完全なプライバシーが解決されるまで、デジタルIDプログラムの進展は完全に停止すべきだという意見もある。しかし、その列車はとっくに駅を出発しており、すでに存在し、サンク・コストがあり、 機能しているように見えるものを解体しようという声は耳に入らないかもしれない。政府サービスへの便利なオンライン・アクセスの要求や、AIが生み出す詐欺の流れを抑制する新しいセキュリティ・システムなどだ。また、「完璧な」システムを前もって設計することが最良のアプローチなのかどうかも定かではない。 

国民IDやモバイル運転免許証をはじめ、ウォレットを中心としたモデルが導入されてきており、改めてこの問題について解決が必要になっている、というところですね。

具体的な課題と緩和策については次回以降で見ていきたいと思います。

Friday, 15. November 2024

Ben Werdmüller

Biden Asked Microsoft to “Raise the Bar on Cybersecurity.” He May Have Helped Create an Illegal Monopoly.

[Renee Dudley, with research by Doris Burke, at ProPublica] Security lapses in Microsoft's own products led to hacks that in turn pushed President Biden to ask for help from it and other tech companies to improve White House security. Microsoft saw it as an opportunity to lock the White House into its products. Microsoft pledged to give $150M in technical services to the gove

[Renee Dudley, with research by Doris Burke, at ProPublica]

Security lapses in Microsoft's own products led to hacks that in turn pushed President Biden to ask for help from it and other tech companies to improve White House security. Microsoft saw it as an opportunity to lock the White House into its products.

Microsoft pledged to give $150M in technical services to the government to upgrade its security. But it wasn't altruistic:

"Microsoft’s seemingly straightforward commitment belied a more complex, profit-driven agenda, a ProPublica investigation has found. The proposal was, in fact, a calculated business maneuver designed to bring in billions of dollars in new revenue, box competitors out of lucrative government contracts and tighten the company’s grip on federal business."

The result may have created an illegal monopoly on government systems - and increased its susceptibility to future Microsoft flaws:

"Competition is not the only issue at stake. As Washington has deepened its relationship with Microsoft, congressional leaders have raised concerns about what they call a cybersecurity “monoculture” in the federal government. Some, like Wyden and Sen. Eric Schmitt, a Republican from Missouri, have blasted the Defense Department in particular for “doubling down on a failed strategy of increasing its dependence on Microsoft.”"

Monocultures are bad. It's hard to see how these kinds of toxic relationships don't get worse over the next four years.

#Technology

[Link]


Doc Searls Weblog

Going Local With Open Networks

If you’re tired of moaning (or celebrating) the after-effects of the U.S. election, or how all of us seem trapped inside the captive markets of Amazon, Apple, Alphabet, Microsoft, Meta, and other feudal powers, take in a talk about something constructive that’s nowhere near any of that, but serves as a model for economies and […]

If you’re tired of moaning (or celebrating) the after-effects of the U.S. election, or how all of us seem trapped inside the captive markets of Amazon, Apple, Alphabet, Microsoft, Meta, and other feudal powers, take in a talk about something constructive that’s nowhere near any of that, but serves as a model for economies and cultures everywhere: India’s Open Network for Digital Commerce, or ONDC., which runs on the Beckn protocol.

Shwetha Rao

That’s what Shwetha Rao will be talking about next Tuesday, 19 November, at Indiana University and on Zoom, at noon Eastern Time.

The title is How Open Networks Impact Local Economies: Lessons from India, and here is how she puts it at that link:

In today’s digital commerce landscape, where major platforms hold significant influence, small businesses and local communities often face substantial barriers. Open networks are helping to shift this landscape by creating a more inclusive and accessible digital economy. A leading example is India’s Open Network for Digital Commerce (ONDC), which demonstrates how decentralized systems can boost digital access and economic opportunity. Unlike traditional platform-centric models, ONDC enables buyers and sellers to transact across various applications, breaking down barriers and fostering inclusivity. This government-backed initiative now connects over a million sellers with shoppers in 600+ cities, supporting small businesses and rural entrepreneurs.

This talk provides insights into the challenges and successes of scaling such open networks, underscoring the role of community engagement and sustainable growth models in driving equitable access and paving the way for a more decentralized digital future.

ONDC is a happening thing:

And, as far as I know, it isn’t happening anywhere outside of India. But it should. And, being open, it could. Might it involve personal AI? We’ll be asking that question, among many others.

The theme for our Beyond the Web salons this academic year is Think Globally, Eat Here: Local Solutions for Big Tech Problems. This solution might be the biggest so far.

As a salon, this will be short on lecture and long on conversation. So bring your curiosity and your questions. We’ll see you there.

 

 


Werdmüller on Medium

What I want from Mozilla

A people-first, safe, independent web. Continue reading on Medium »

A people-first, safe, independent web.

Continue reading on Medium »


Ben Werdmüller

What I want from Mozilla

Like many of you, I received a survey today with the title: “What is your dream for Mozilla?” I filled it in, but the potential for Mozilla is so expansive and critical to the future of the internet that I wanted to address my thoughts in greater depth here. Mozilla describes its mission as follows: Our mission is to ensure the Internet is a global public resource, open and accessible

Like many of you, I received a survey today with the title: “What is your dream for Mozilla?” I filled it in, but the potential for Mozilla is so expansive and critical to the future of the internet that I wanted to address my thoughts in greater depth here.

Mozilla describes its mission as follows:

Our mission is to ensure the Internet is a global public resource, open and accessible to all. An Internet that truly puts people first, where individuals can shape their own experience and are empowered, safe and independent.

I believe Mozilla is best placed to achieve this goal by explicitly fostering an ecosystem of open, accessible software that promotes user independence, privacy, and safety. It should be a facilitator, supporter, and convener through which projects that promote these values thrive.

What should its next chapter look like in an internet increasingly dominated by corporate interests? Mozilla has the tools, the history, and the mission to reclaim its role as a pioneer of the open web. But doing so requires bold steps and a renewed focus on impact and innovation.

A mission focus on impact

Its success should be determined through impact. It should publish an impact report that shows how it has spread usable, private, open software worldwide, and solicit donations based on that activity. How has Mozilla prevented a monopoly of ad-driven surveillance technology in different markets? How has Mozilla helped people keep themselves safe online while seeking reproductive healthcare? How has Mozilla tech been used in authoritarian regions to support community well-being? It should clarify its roadmap for turning its mission into measurable outcomes, and then be unashamed about fundraising based on this directed mission. These focused impact reports would guide internal strategy, demonstrate accountability, and inspire public and donor trust.

Conversely, I believe Mozilla is not a media company. That means it should not attempt to be Consumer Reports; we don’t need it to navigate the world of AI for us or tell us what to buy for Christmas. Those are valuable pursuits, but Mozilla should leave them to existing technology media companies.

Impact-focused products that bring something new to the table

I believe this impact focus means that it should not seek to charge consumers for its products. If the mission is to make the internet open, accessible, private, and safe for individuals, as much friction towards achieving that goal should be removed as possible.

Many of Mozilla’s efforts already fall in line with this mission. The Firefox browser itself is an open, anti-surveillance alternative to corporate-driven browsers like Chrome, although it has fallen behind. This is in part because of anti-competitive activity from companies like Google, and in part because some of the most interesting innovations in the browser space have happened elsewhere: for example, Arc’s radical changes to browser user experience are really compelling, and should probably have been a Mozilla experiment.

Firefox Relay — which makes it easy to hide your email address when dealing with a third party — and Mozilla VPN are similarly in line at first glance. But because the VPN is little more than a wrapped Mullvad VPN, with revenue splitting between the two organizations, it isn’t really adding anything new. In a similar vein, Relay is very similar to DuckDuckGo’s email protection, among others. And why is one branded as Firefox and one as Mozilla? I’m sure the organization itself has an answer to this, but I couldn’t begin to tell you. (For what it’s worth, Mozilla seems to agree about the distraction and has scaled back support for these services.)

AI is a new, hot technology, but there’s nothing really new for Mozilla to do here, either. Many vendors are working on AI privacy, because that’s where a lot of the real revenue is: organizations with privacy needs that relate to sensitive information. There is no reason why Mozilla will be the best at creating these solutions, or differentiated in doing so.

Instead, to paraphrase Bill Clinton: it’s the web, stupid.

If Firefox is the biggest, most impactful software product in Mozilla’s arsenal today, how can it bring it back to prominence? One interesting route might be to use it as a way for third parties to explore the future of the browser. Mozilla can ship its own Firefox user experience, but what if it was incredibly simple for other people to also build wildly remixed browsers? Could Mozilla build unique features, like privacy layers tailored for vulnerable users, that competitors don’t offer?

Projects like Zen Browser already use core Firefox to build new experiences, but there’s a lot of coding involved, and they’re not discoverable from within Firefox itself. What if they were? One can imagine Firefox browsers optimized for everything from artists and activists to salespeople and investors, all available from a browser marketplace. The authors of those experiences would, by sharing their unique browser remixes, help spread the Firefox browser overall. While browsers like Chrome serve corporate goals around ads and analytics, the Mozilla mission gives Firefox a mandate to be a playground for innovation. It should be that. (And, yes, AI can play a supporting role here too.)

Note that while I think products should be made available to consumers free of charge, that doesn’t mean that Mozilla shouldn’t make money. For example, if there’s revenue in specific experiences for certain enterprise or partner use cases, why not explore that? Enterprise offerings could directly fund Mozilla’s open-source projects, reinforcing its mission.

Truly supporting a vibrant open web

While Mozilla’s products are key to advancing its mission, its influence can extend far beyond the browser. Mozilla has the potential to be a home base for similar projects that have the potential to create a more open, private, safe and self-directed web.

While that might mean support technically — developer resources, libraries, and guides — the most burning needs for user-centric open source projects are often unrelated to code. These include:

Experience design. Most open source projects lean towards coding as a core competency and aren’t able to provide the same polished user experiences as commercial software. Mozilla could bridge the gap by providing training and direct resources to elevate the design of user-centric open source projects, and to prepare these projects to work well with designers. Legal help. Some projects need help with boilerplate documents like privacy policies, terms of service agreements, and contributor license agreements; others need assistance figuring out licensing; some will have more individual legal needs. It’s highly unlikely that most projects have the ability to produce this in-house, meaning they either leave themselves open to liabilities by not getting legal advice, or have to retain legal help at a high cost to themselves. Mozilla can help. Policy assistance. Mozilla could help projects navigate complex regulatory environments, such as GDPR or CCPA compliance or lobbying for user-first policies globally. Funding. Offering grants or investments for vetted open source projects could amplify Mozilla’s impact. It’s done this in the past a little bit through its defunct WebFWD accelerator and specific grants, and it’s doing a version of this today with its accelerator for advancing open source AI. There’s room for a wider scope here, and a little bit of a carrot-and-stick approach: for example, funding could be contingent on a project demonstrating its human-centered approach and being willing to work with designers. Go-to-market strategy. Mozilla could provide guidance on launching and scaling projects, including identifying its first users, building community, and targeting messaging to them. Mozilla could host workshops on community engagement and messaging, enabling projects to scale effectively. Regional impact. Different geographic communities have different needs. Regional accelerators could deliver it as a curriculum to local cohorts of open source teams. Regional accelerators could support open-source teams with tailored workshops and local mentorship, building capacity while addressing regional challenges.

A centralized Mozilla hub could provide templates, guides, and access to expert mentorship for projects to tackle legal, design, and policy hurdles. One-to-one help could be provided for the projects with the most potential to meaningfully fulfill Mozilla’s impact goals. And through it all, Mozilla can act as a connector: between the projects themselves, and to people and organizations in the tech industry who want to help mission-driven projects.

By creating a thriving ecosystem of user-centric open-source projects, Mozilla can ensure its mission outlasts individual products.

The dream of the nineties is alive in Mozilla

Mozilla has the tools, the history, and the mission to make the internet better for everyone. By fostering innovation and empowering communities, it can reclaim its role as a leader in the fight for an open web. Now is the time for bold action — and a strong focus on its mission.

That’s my dream for Mozilla. Now, what’s yours?


Simon Willison

Voting opens for Oxford Word of the Year 2024

Voting opens for Oxford Word of the Year 2024 One of the options is slop! slop (n.): Art, writing, or other content generated using artificial intelligence, shared and distributed online in an indiscriminate or intrusive way, and characterized as being of low quality, inauthentic, or inaccurate. Via @dloss Tags: slop, ethics, generative-ai, ai, llms

Voting opens for Oxford Word of the Year 2024

One of the options is slop!

slop (n.): Art, writing, or other content generated using artificial intelligence, shared and distributed online in an indiscriminate or intrusive way, and characterized as being of low quality, inauthentic, or inaccurate.

Via @dloss

Tags: slop, ethics, generative-ai, ai, llms


John Philpin : Lifestream

🔗 National risks being eaten alive by NZ First & ACT D

🔗 National risks being eaten alive by NZ First & ACT Dear Christopher (Luxon) .. I know Washington DC is a long way a way .. but there are some lessons there .. you can watch in real time the future of New Zealand .. at least the future based on the current trajectory.

🔗 National risks being eaten alive by NZ First & ACT

Dear Christopher (Luxon) .. I know Washington DC is a long way a way .. but there are some lessons there .. you can watch in real time the future of New Zealand .. at least the future based on the current trajectory.


So where do all the cool kids head to get their leather / Fa

So where do all the cool kids head to get their leather / Faux leather iPhone cases these days? It’s been 5 years since I bought my last one and this Apple Silicon stuff is definitely not cutting it.

So where do all the cool kids head to get their leather / Faux leather iPhone cases these days?

It’s been 5 years since I bought my last one and this Apple Silicon stuff is definitely not cutting it.


Simon Willison

Recraft V3

Recraft V3 Recraft are a generative AI design tool startup based out of London who released their v3 model a few weeks ago. It's currently sat at the top of the Artificial Analysis Image Arena Leaderboard, beating Midjourney and Flux 1.1 pro. The thing that impressed me is that it can generate both raster and vector graphics... and the vector graphics can be exported as SVG! Here's what I got

Recraft V3

Recraft are a generative AI design tool startup based out of London who released their v3 model a few weeks ago. It's currently sat at the top of the Artificial Analysis Image Arena Leaderboard, beating Midjourney and Flux 1.1 pro.

The thing that impressed me is that it can generate both raster and vector graphics... and the vector graphics can be exported as SVG!

Here's what I got for raccoon with a sign that says "I love trash" - SVG here.

That's an editable SVG - when I open it up in Pixelmator I can select and modify the individual paths and shapes:

They also have an API. I spent $1 on 1000 credits and then spent 80 credits (8 cents) making this SVG of a pelican riding a bicycle, using my API key stored in 1Password:

export RECRAFT_API_TOKEN="$( op item get recraft.ai --fields label=password \ --format json | jq .value -r)" curl https://external.api.recraft.ai/v1/images/generations \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $RECRAFT_API_TOKEN" \ -d '{ "prompt": "california brown pelican riding a bicycle", "style": "vector_illustration", "model": "recraftv3" }'

Tags: ai, generative-ai, text-to-image

Thursday, 14. November 2024

Just a Theory

Bye Twitter

I finally deleted all my Twitter accounts, notably @theory. If you see that username, it’s not me.

I quit Twitter in November 2022, when it became clear that Phony Stark would take control of the company. No regrets; I’ve been active on Mastodon since then, via Ivory, where I’ve enjoyed the Twitter ca. 2010 vibe.

But I hadn’t deleted my Tweets or my accounts until this week. The change was finally triggered by the latest announcement:

On Friday, X will adopt a new terms of service policy that for the first time explicitly says all users agree to let their posts be used to train artificial intelligence, like the company’s generative AI service, Grok. Currently, X users can go to the site’s settings and opt out.

I have no interest in feeding Grok or any other large language model. So this week I deleted all my tweets and then my accounts. I tried to transfer my personal username to a new, empty account, but in the second between changing it and switching to another tab to take it, it was somehow acquired by an account created in 2011 with no posts. Not sure how that worked. Pretty scummy.

So, just to be clear, anyone using these usernames on Twitter are not me anymore:

@theory @lunar_theory @DesignScene @pgxn

See you on Mastodon — or perhaps Bluesky.

More about… Twitter Personal

Simon Willison

OpenAI Public Bug Bounty

OpenAI Public Bug Bounty Reading this investigation of the security boundaries of OpenAI's Code Interpreter environment helped me realize that the rules for OpenAI's public bug bounty inadvertently double as the missing details for a whole bunch of different aspects of their platform. This description of Code Interpreter is significantly more useful than their official documentation! Code e

OpenAI Public Bug Bounty

Reading this investigation of the security boundaries of OpenAI's Code Interpreter environment helped me realize that the rules for OpenAI's public bug bounty inadvertently double as the missing details for a whole bunch of different aspects of their platform.

This description of Code Interpreter is significantly more useful than their official documentation!

Code execution from within our sandboxed Python code interpreter is out of scope. (This is an intended product feature.) When the model executes Python code it does so within a sandbox. If you think you've gotten RCE outside the sandbox, you must include the output of uname -a. A result like the following indicates that you are inside the sandbox -- specifically note the 2016 kernel version:

Linux 9d23de67-3784-48f6-b935-4d224ed8f555 4.4.0 #1 SMP Sun Jan 10 15:06:54 PST 2016 x86_64 x86_64 x86_64 GNU/Linux

Inside the sandbox you would also see sandbox as the output of whoami, and as the only user in the output of ps.

Tags: security, generative-ai, openai, ai, llms, code-interpreter


IdM Laboratory

リンク可能性、リンク不可能性の話

こんにちは、富士榮です。 先日のInternet Identity Workshop(IIW)でもVerifiable Credentialsのフォーマットやウォレットの管理方式とリンク可能性(Linkability)・リンク不可能性(Unlinkability)について議論がありましたが、そもそもどういうことなの?って話です。 要するにKim CameronのThe laws of identityでいうところの「Directed Identity」の話で、例えば複数のリライングパーティ同士が結託すると意図しないアイデンティティの紐付け(名寄せ)が行われて開示していない属性についても知られてしまうので、例えば識別子をリライングパーティごとに分ける(いわゆるPairwise identifier)を使って仮名化しようよ、みたいな話です。 そういえば昔、仮名と匿名の話をしたな、っ

こんにちは、富士榮です。

先日のInternet Identity Workshop(IIW)でもVerifiable Credentialsのフォーマットやウォレットの管理方式とリンク可能性(Linkability)・リンク不可能性(Unlinkability)について議論がありましたが、そもそもどういうことなの?って話です。


要するにKim CameronのThe laws of identityでいうところの「Directed Identity」の話で、例えば複数のリライングパーティ同士が結託すると意図しないアイデンティティの紐付け(名寄せ)が行われて開示していない属性についても知られてしまうので、例えば識別子をリライングパーティごとに分ける(いわゆるPairwise identifier)を使って仮名化しようよ、みたいな話です。

そういえば昔、仮名と匿名の話をしたな、って思い出しました。(昔すぎて恥ずかしい)

ID連携における仮名 from Naohiro Fujie


この名寄せの話がVCやウォレットにどう関係するの?ってところですが、ちょうど良い資料をSpruceIDのWayneがいい資料を公開しているのでご紹介を。


Provably Forgotten Signatures: Adding Privacy to Digital Identity

https://blog.spruceid.com/provably-forgotten-signatures-adding-privacy-to-digital-identity/


次回かいつまんで読んでいこうと思います。


Simon Willison

Quoting OpenAI, Google and Anthropic Are Struggling to Build More Advanced AI

Anthropic declined to comment, but referred Bloomberg News to a five-hour podcast featuring Chief Executive Officer Dario Amodei that was released Monday. "People call them scaling laws. That's a misnomer," he said on the podcast. "They're not laws of the universe. They're empirical regularities. I am going to bet in favor of them continuing, but I'm not certain of that." [...] An Anthropic

Anthropic declined to comment, but referred Bloomberg News to a five-hour podcast featuring Chief Executive Officer Dario Amodei that was released Monday.

"People call them scaling laws. That's a misnomer," he said on the podcast. "They're not laws of the universe. They're empirical regularities. I am going to bet in favor of them continuing, but I'm not certain of that."

[...]

An Anthropic spokesperson said the language about Opus was removed from the website as part of a marketing decision to only show available and benchmarked models. Asked whether Opus 3.5 would still be coming out this year, the spokesperson pointed to Amodei’s podcast remarks. In the interview, the CEO said Anthropic still plans to release the model but repeatedly declined to commit to a timetable.

OpenAI, Google and Anthropic Are Struggling to Build More Advanced AI, Rachel Metz, Shirin Ghaffary, Dina Bass, and Julia Love for Bloomberg

Tags: anthropic, claude, generative-ai, ai, llms


Ben Werdmüller

Newsletter platform beehiiv launches multi-million dollar journalism fund

[Sara Fischer at Axios] "Beehiiv, a newsletter startup taking aim at Substack, says it's making a "multi-million dollar investment" to create a new "beehiiv Media Collective" of journalists on its platform." Beehiiv's new fund for independent journalists will give them a monthly health insurance stipend and pre-publish legal review support. There's also Getty access and dee

[Sara Fischer at Axios]

"Beehiiv, a newsletter startup taking aim at Substack, says it's making a "multi-million dollar investment" to create a new "beehiiv Media Collective" of journalists on its platform."

Beehiiv's new fund for independent journalists will give them a monthly health insurance stipend and pre-publish legal review support. There's also Getty access and deeper business strategy report. It's actually kind of remarkable - and a clear shot across the bow to competitors like Substack.

More competitors to Substack - which famously has supported actual Nazis - can only be a good thing. The real question is how long this fund will last, and whether the journalists who take advantage of it will sink or swim when it inevitably comes to an end. Hopefully everyone who takes part uses the time to become self-sufficient.

#Media

[Link]


John Philpin : Lifestream

💬 a fine piece of analysis from Kevin Drum.

💬 a fine piece of analysis from Kevin Drum.

💬 a fine piece of analysis from Kevin Drum.


🔗 The Onion has won the bid for Infowars’ assets. .. a perfe

🔗 The Onion has won the bid for Infowars’ assets. .. a perfect home. “The Onion is proud to acquire Infowars, and we look forward to continuing its storied tradition of scaring the site’s users with lies until they fork over their cold, hard cash,” said The Onion CEO Ben Collins. “Or Bitcoin. We will also accept Bitcoin.”

🔗 The Onion has won the bid for Infowars’ assets. .. a perfect home.

“The Onion is proud to acquire Infowars, and we look forward to continuing its storied tradition of scaring the site’s users with lies until they fork over their cold, hard cash,” said The Onion CEO Ben Collins. “Or Bitcoin. We will also accept Bitcoin.”


Patrick Breyer

Ein Jahr Absage an die Chatkontrolle

Heute vor einem Jahr haben wir im Ausschuss für bürgerliche Freiheiten, Justiz und Inneres (LIBE) des EU-Parlaments eine fraktionsübergreifende Position gegen anlasslose Chatkontrolle beschlossen. Damit haben wir den wahnwitzigen Überwachungsplänen …

Heute vor einem Jahr haben wir im Ausschuss für bürgerliche Freiheiten, Justiz und Inneres (LIBE) des EU-Parlaments eine fraktionsübergreifende Position gegen anlasslose Chatkontrolle beschlossen. Damit haben wir den wahnwitzigen Überwachungsplänen eine Absage erteilt, welche die von der Leyen-EU-Kommission vorgelegt hat.

Neben dem Europäischen Parlament müssen die EU-Regierungen im Rat eine Position zu dem Gesetzentwurf beschließen. Schon drei Mal wurde die notwendige Mehrheit dort verfehlt. Sowohl die spanische Ratspräsidentschaft im Oktober 2023 als auch der belgische Vorsitz im Juni 2024 haben die Abstimmung über die Chatkontrolle kurzfristig von der Tagesordnung des Rats der EU genommen, als sich ein Scheitern bei der Abstimmung abgezeichnet hat – ein Riesenerfolg der Zivilgesellschaft.

Die Europäische Kommission wird gerade neu gebildet. Bisher tat EU-Innenkommissarin Ylva Johansson (S&D, Schweden) als Mutter der Chatkontrolle alles, um ihr Vorhaben durchzuprügeln. Nun soll Magnus Brunner (EVP, Österreich) auf sie im Amt folgen und das Projekt weiterführen. Er hat in seinem Anhörungsverfahren gegenüber dem Europäischen Parlament auf die Frage der Piratin Markéta Gregorová schon signalisiert, dass er die Verhandlungen zur Chatkontrolle zum Abschluss bringen will. Ein Bekenntnis zum Schutz vor flächendeckender Chatkontrolle oder Schutz sicherer Verschlüsselung kam ihm nicht über die Lippen. Das ist ein Fehler: die EU-Kommission muss sich endgültig von Massenüberwachung verabschieden. Die anlasslose Chatkontrolle muss vom Tisch!

Im Europaparlament habe ich mich als euer Abgeordneter der Piratenpartei Deutschland federführend in den Verhandlungen eingesetzt, um die Chatkontrolle zu verhindern und auch die Abgeordneten der anderen Fraktionen in unzähligen Beratungen davon überzeugt. Heute trägt Markéta Gregorová für die europäischen Piraten die Fackel der Digitalen Freiheitskämpfer*innen im Europäischen Parlament weiter und setzt sich dort gegen die Chatkontrolle ein.

Im Rat der EU konnten wir bisher nur dank des unermüdlichen Engagements und des lautstarken Protests der Zivilgesellschaft totalitäre flächendeckende Chatkontrollen verhindern. Möglich war das dank jeder Aktivistin und jedem Einzelnen, die sich telefonisch, per E-Mail oder per Brief immer wieder an Politik und Regierung gewendet haben. Danke für euren Einsatz! Wir müssen stark bleiben und auch bei jedem weiteren Anlauf gegen die extreme und weltweit einzigartige Bedrohung des digitalen Briefgeheimnisses und sicherer Verschlüsselung durch die Chatkontrolle kämpfen!

Themenseite Chatkontrolle: chatkontrolle.de


Simon Willison

PyPI now supports digital attestations

PyPI now supports digital attestations Dustin Ingram: PyPI package maintainers can now publish signed digital attestations when publishing, in order to further increase trust in the supply-chain security of their projects. Additionally, a new API is available for consumers and installers to verify published attestations. This has been in the works for a while, and is another component of

PyPI now supports digital attestations

Dustin Ingram:

PyPI package maintainers can now publish signed digital attestations when publishing, in order to further increase trust in the supply-chain security of their projects. Additionally, a new API is available for consumers and installers to verify published attestations.

This has been in the works for a while, and is another component of PyPI's approach to supply chain security for Python packaging - see PEP 740 – Index support for digital attestations for all of the underlying details.

A key problem this solves is cryptographically linking packages published on PyPI to the exact source code that was used to build those packages. In the absence of this feature there are no guarantees that the .tar.gz or .whl file you download from PyPI hasn't been tampered with (to add malware, for example) in a way that's not visible in the published source code.

These new attestations provide a mechanism for proving that a known, trustworthy build system was used to generate and publish the package, starting with its source code on GitHub.

The good news is that if you're using the PyPI Trusted Publishers mechanism in GitHub Actions to publish packages, you're already using this new system. I wrote about that system in January: Publish Python packages to PyPI with a python-lib cookiecutter template and GitHub Actions - and hundreds of my own PyPI packages are already using that system, thanks to my various cookiecutter templates.

Trail of Bits helped build this feature, and provide extra background about it on their own blog in Attestations: A new generation of signatures on PyPI:

As of October 29, attestations are the default for anyone using Trusted Publishing via the PyPA publishing action for GitHub. That means roughly 20,000 packages can now attest to their provenance by default, with no changes needed.

They also built Are we PEP 740 yet? (key implementation here) to track the rollout of attestations across the 360 most downloaded packages from PyPI. It works by hitting URLs such as https://pypi.org/simple/pydantic/ with a Accept: application/vnd.pypi.simple.v1+json header - here's the JSON that returns.

I published an alpha package using Trusted Publishers last night and the files for that release are showing the new provenance information already:

Which links to this Sigstore log entry with more details, including the Git hash that was used to build the package:

Sigstore is a transparency log maintained by Open Source Security Foundation (OpenSSF), a sub-project of the Linux Foundation.

Via Hacker News

Tags: packaging, pypi, python, supply-chain, github, dustin-ingram, github-actions, psf


QuickTime video script to capture frames and bounding boxes

QuickTime video script to capture frames and bounding boxes An update to an older TIL. I'm working on the write-up for my DjangoCon US talk on plugins and I found myself wanting to capture individual frames from the video in two formats: a full frame capture, and another that captured just the portion of the screen shared from my laptop. I have a script for the former, so I got Claude to update

QuickTime video script to capture frames and bounding boxes

An update to an older TIL. I'm working on the write-up for my DjangoCon US talk on plugins and I found myself wanting to capture individual frames from the video in two formats: a full frame capture, and another that captured just the portion of the screen shared from my laptop.

I have a script for the former, so I got Claude to update my script to add support for one or more --box options, like this:

capture-bbox.sh ../output.mp4 --box '31,17,100,87' --box '0,0,50,50'

Open output.mp4 in QuickTime Player, run that script and then every time you hit a key in the terminal app it will capture three JPEGs from the current position in QuickTime Player - one for the whole screen and one each for the specified bounding box regions.

Those bounding box regions are percentages of the width and height of the image. I also got Claude to build me this interactive tool on top of cropperjs to help figure out those boxes:

Tags: claude-artifacts, ai-assisted-programming, claude, tools, projects, ffmpeg, llms, ai, generative-ai


Ben Werdmüller

Is BlueSky the new Twitter, and if so is that a good thing?

[Mathew Ingram at The Torment Nexus] Mathew wrestles with where Bluesky sits in the future of social media given its connections to venture capital and blockchain supporters: "I have no doubt that, as Cory says, Graber and the other founders of Bluesky are sincere in their desire to build an open service with a federated protocol, etc. But history has shown time and again th

[Mathew Ingram at The Torment Nexus]

Mathew wrestles with where Bluesky sits in the future of social media given its connections to venture capital and blockchain supporters:

"I have no doubt that, as Cory says, Graber and the other founders of Bluesky are sincere in their desire to build an open service with a federated protocol, etc. But history has shown time and again that economic interests often interfere with the best efforts of founders."

Here's my slightly controversial take: I think there's something to learn from blockchain and how it developed. We already see that represented in the data structures Bluesky uses behind the scenes, and beyond that, an optimistic cultural take on decentralization. That doesn't mean crypto markets aren't full of scammers - there's certainly more to avoid than to learn from - but blockchain is not an irrelevant pursuit, even if blockchains themselves are not the best route forwards.

This feels right to me:

"In the short term at least, it seems as though we could have three or four competing social networks: one, Twitter/X, is the place for right-wing Musk fans and tech bros and Trump supporters (and journalists and others who need to be there for work); Bluesky is the place for that early Twitter anything-goes vibe plus journalists and real-time news; Mastodon is the place for nerds and geeks and others who like the nuts-and-bolts of social tech; and Threads is... well, Threads is whatever is left over after all of those other things are removed :-)"

Let's see what happens.

#Fediverse

[Link]


The Pragmatic Engineer

The Pulse #115: LLM improvements slowing down?

Several signs indicate that improving LLMs with more training/compute is no longer efficient. Also: dev charged $1,100 after following a tutorial, a reminder to be vigilant with open source, and more

The Pulse is a series covering insights, patterns, and trends within Big Tech and startups. Notice an interesting event or trend? Send me a message.

Today, we cover:

Industry pulse. ChatGPT to compete with Google on Search, another large fundraise for an AI coding tool, Klarna files for IPO, possible public/private cloud equilibrium, and more.

LLM improvements slowing down? Several signs indicate the next versions of LLM foundational models won’t get much better despite the extra compute and training data. Reports at OpenAI, Anthropic, and Google, all suggest a slowdown in development, which could prompt a search for new routes to improvement.

Dev charged $1,100 after AWS tutorial. An AWS tutorial on how to integrate OpenSearch omitted key details, like how it launches an expensive service, and that shutting down the sandbox environment doesn’t shut down the OpenSearch service. A dev got burned financially, and is unhappy the tutorial still hasn’t been updated for clarity.

Open source project vigilance reminder. A look at an attempt to add malicious code to the Exo open source project. The attack seemed rather amateurish, and was perhaps intended to be spotted. It’s a reminder that open source projects are increasingly the targets of malicious activity.

1. Industry pulse ChatGPT launches Search

In big news, OpenAI has launched ChatGPT search, which looks like a direct competitor of Google. The search functionality summarizes results on the web and links to sources, making them available as citations. It’s an approach that’s similar to AI search engine, Perplexity.

ChatGTP’s new search interface, available for Pro customers ChatGTP’s search output mixes summaries of search results and links to original sources

ChatGPT preparing to go head-on-head with Google matters because it has 200 million weekly active users, as of August. We don’t know how many monthly active users Google Search has, but it serves more than 5 billion searches per day, and YouTube has over 2 billion logged-in monthly users. A truly fascinating thing about OpenAI’s numbers is that these numbers were zero just two years ago, before ChatGPT launched on November 30, 2022.

If ChatGPT’s Search product gains momentum, it’s possible Google could adapt its own search UX to a summarization interface. It would be a drastic change, but it’s hard to imagine Google standing idle while its search product potentially being out-innovated.

Another big AI fundraiser: Tessl

Read more


Ben Werdmüller

Bluesky, the Fediverse, and the future of social media

I care a lot about the future of social media. It’s how many of us learn about the world and connect to each other; putting something so important in the hands of a handful of centralized corporations has repeatedly proven itself to be harmful. That’s why I’m so excited about the growth of federated and decentralized social media in the wake of Elon Musk’s disastrous acquisition of X. These pl

I care a lot about the future of social media. It’s how many of us learn about the world and connect to each other; putting something so important in the hands of a handful of centralized corporations has repeatedly proven itself to be harmful. That’s why I’m so excited about the growth of federated and decentralized social media in the wake of Elon Musk’s disastrous acquisition of X. These platforms give more control to communities and individuals, reducing the risks of a central corporation manipulating the global conversation through algorithms or other means.

Although a lot of my focus has been on Fediverse platforms like Mastodon, from time to time I’ve mentioned that I’m really impressed with what the Bluesky team has achieved. The Bluesky platform is growing very quickly and seems to be the go-to choice for less-technical users like journalists, politicians, and so on who want to leave X. Bluesky offers valuable insights for anyone interested in the future of social media and how to build a vibrant alternative platform.

Easy to understand onboarding: You register at the Bluesky site. To get you started, you can access “starter packs” of users to follow around various topics, so your feed is never empty. Here’s a starter pack of ProPublica journalists, for example, or people in tech from underrepresented communities. It feels alive: The posts are both timely and engaging. This is in contrast to Mastodon, where they’re purely chronologically-ordered, or Threads, where I was still seeing hopeful posts from before the election a week later (because they piggybacked on the Instagram algorithm, which is optimized for a different kind of content). News can actually break here — and so can memes. Find an old-timer and ask them about ALF: an inside joke that I absolutely refuse to log an explanation for here. Search works universally: It simply doesn’t on Mastodon, and I can only describe the search engine on Threads as weird. It’s moderated and facilitated: The site has easy-to-understand moderation. More than that, the team seems to have invested in the culture of the community they’re creating. Particularly in the beginning, they did a lot of community facilitation work that set the tone of the place. The result — so far — is a palpable sense of fun in contrast to a seriousness that pervades both Threads and Mastodon.

At the same time, Bluesky benefits from an open mindset, an open-source codebase, and a permissionless protocol that allows anyone to build tools on top of it. Critics will note that it isn’t really decentralized yet: there’s one dominant personal data store that basically everyone is attached to. In contrast to Mastodon’s model of co-operative communities anchored by a non-profit, Bluesky is a venture funded startup that grew out of Twitter.

Other critics complain about the involvement of Jack Dorsey, who created Twitter and therefore a lot of the problems that we’re all trying to get away from. I don’t think that’s a valid complaint: he famously both established Bluesky because he felt that Twitter should have been a protocol rather than a company, and both left the board and closed his account after becoming dissatisfied with the way Bluesky was run as a moderated community. He has since described X as “freedom technology” and put a ton of his own money behind Nostr. I’ve personally found Nostr to be a particularly toxic decentralized network dominated by Bitcoin-loving libertarians. This may indicate where his priorities lie.

I’ll be honest: on paper, I like Mastodon’s model better. It’s a community-driven effort paid for transparently by donations, much like any non-profit. (Much like any non-profit, the bulk of the funding comes from larger entities, but these are advertised on the Mastodon website alongside smaller-dollar donors.) I also like the co-operative model where smaller communities can dictate their own norms but interoperate with the larger network, which means that, for example, communities for trans posters or journalists can provide more directed support.

But this model faces a much harder road. It means, firstly, that there is less money to go around (Bluesky has raised $36M so far; Mastodon raised €326K in 2022), and secondly, that it’s harder to understand for a new user who wants to join in. It’s also clear that CEO Jay Graber has established a cohesive team that by all accounts is a lot of fun to work in. That counts for a lot and has helped to establish a healthy community.

Even with its hurdles, Mastodon’s model embodies a rare, user-first ethos, and I believe it’s worth supporting. In the end, the future of social media may depend on which values we choose to uphold.

I suspect both will continue to exist side by side. If I had to guess, Bluesky might become a mainstream platform for people who want something very close to pre-acquisition Twitter (which it is rapidly becoming right now), and the Fediverse might become the default glue between any social platform. For example, I post my book reading activity on Bookwyrm, which I find more useful in its own right than Goodreads. Other people can follow and interact with my book reviews there, or they can follow from other Fediverse-compatible platforms like Mastodon. (Right now, my followers are about half and half). Mastodon itself will allow niche supportive communities to grow, and of course, the fact that Threads is building Fediverse support means that any of its hundreds of millions of users will be able to interact with anyone on any other Fediverse platform.

Bluesky may evolve into a streamlined alternative to Twitter, while the Fediverse could serve as a decentralized, cross-platform connector among diverse networks. This dynamic offers a promising future for users, with both worlds learning from one another in a productive tension that has the potential to strengthen the open social web. That’s good news for everyone who values an open, user-driven future for social media.


The Onion acquires Infowars

[Oliver Darcy at Status] I literally had to check to see if this was real: "The Onion has successfully acquired Infowars. The satirical news outlet purchased Alex Jones' right-wing conspiracy empire at a court-ordered auction, the families of the victims of the Sandy Hook Elementary School shooting announced Thursday." I cannot think of a more fitting end for such a toxic

[Oliver Darcy at Status]

I literally had to check to see if this was real:

"The Onion has successfully acquired Infowars.

The satirical news outlet purchased Alex Jones' right-wing conspiracy empire at a court-ordered auction, the families of the victims of the Sandy Hook Elementary School shooting announced Thursday."

I cannot think of a more fitting end for such a toxic, falsehood-filled media outlet. Of course The Onion should own it. Where better than the original home of fake news?

Clearly the Sandy Hook families felt the same way: they actually decided to forgo part of the money owed to them in order to make this happen.

"While Jones will no longer own Infowars, he has indicated that he will continue to broadcast after losing control of the media company."

May he lose that one too.

#Media

[Link]


Simon Willison

Releasing the largest multilingual open pretraining dataset

Releasing the largest multilingual open pretraining dataset Common Corpus is a new "open and permissible licensed text dataset, comprising over 2 trillion tokens (2,003,039,184,047 tokens)" released by French AI Lab PleIAs. This appears to be the largest available corpus of openly licensed training data: 926,541,096,243 tokens of public domain books, newspapers, and Wikisource content 387

Releasing the largest multilingual open pretraining dataset

Common Corpus is a new "open and permissible licensed text dataset, comprising over 2 trillion tokens (2,003,039,184,047 tokens)" released by French AI Lab PleIAs.

This appears to be the largest available corpus of openly licensed training data:

926,541,096,243 tokens of public domain books, newspapers, and Wikisource content 387,965,738,992 tokens of government financial and legal documents 334,658,896,533 tokens of open source code from GitHub 221,798,136,564 tokens of academic content from open science repositories 132,075,315,715 tokens from Wikipedia, YouTube Commons, StackExchange and other permissively licensed web sources

It's majority English but has significant portions in French and German, and some representation for Latin, Dutch, Italian, Polish, Greek and Portuguese.

I can't wait to try some LLMs trained exclusively on this data. Maybe we will finally get a GPT-4 class model that isn't trained on unlicensed copyrighted data.

Via @dorialexander

Tags: ethics, generative-ai, training-data, ai, llms


John Philpin : Lifestream

Someone suggested that Coldplay playing to 150,000 over thre

Someone suggested that Coldplay playing to 150,000 over three nights has ‘injected millions into local economy’. Not sure I see it … until I know if local economy is ‘Auckland’ or ‘New Zealand’ .. and how many people in each of those are really from ‘out of town / country’.

Someone suggested that Coldplay playing to 150,000 over three nights has ‘injected millions into local economy’. Not sure I see it … until I know if local economy is ‘Auckland’ or ‘New Zealand’ .. and how many people in each of those are really from ‘out of town / country’.

Wednesday, 13. November 2024

IdM Laboratory

OpenID Foundation Workshopの資料が公開されています

こんにちは、富士榮です。 先日お伝えしたIIWの前日にMicrosoftシリコンバレーで行われたOpenID Foundation Workshopの資料が公開されています。 先日のレビューポスト https://idmlab.eidentity.jp/2024/10/openid-foundation-workshop.html 公開されたサイト https://openid.net/presentations-media/
こんにちは、富士榮です。
先日お伝えしたIIWの前日にMicrosoftシリコンバレーで行われたOpenID Foundation Workshopの資料が公開されています。


先日のレビューポスト https://idmlab.eidentity.jp/2024/10/openid-foundation-workshop.html
公開されたサイト https://openid.net/presentations-media/

Ben Werdmüller

Why the Guardian is no longer posting on X

[The Guardian] Given the reluctance to leave X among most publishers, the Guardian is taking a big leadership role here by refusing to continue to post to X: "This is something we have been considering for a while given the often disturbing content promoted or found on the platform, including far-right conspiracy theories and racism. The US presidential election campaign ser

[The Guardian]

Given the reluctance to leave X among most publishers, the Guardian is taking a big leadership role here by refusing to continue to post to X:

"This is something we have been considering for a while given the often disturbing content promoted or found on the platform, including far-right conspiracy theories and racism. The US presidential election campaign served only to underline what we have considered for a long time: that X is a toxic media platform and that its owner, Elon Musk, has been able to use its influence to shape political discourse."

X users will continue to be able to share links to the Guardian, which is simply a property of them being a web platform. The Guardian also reserves the right to embed tweets when they are newsworthy.

I couldn't agree with their reasoning more, and I sincerely hope that more publications follow suit. I also predict that this won't hurt the Guardian's metrics overall, at least in the medium term.

I also appreciate their note at the bottom of the article:

"Thankfully, we can do this because our business model does not rely on viral content tailored to the whims of the social media giants’ algorithms – instead we’re funded directly by our readers."

Yet another reason why patronage models are far better than advertising.

#Media

[Link]


John Philpin : Lifestream

So I will plant this post in various places across the inter

So I will plant this post in various places across the inter webs as I reach out to find those customers that need what I do. The link is worth clicking on, but that’s not really what this is about. Except, in a way, it kinda is… let me explain. This part of John’s business model is a weekly sponsorship. He sells them upfront. He can sell two on the same week. And if one becomes open suddenly

So I will plant this post in various places across the inter webs as I reach out to find those customers that need what I do.

The link is worth clicking on, but that’s not really what this is about. Except, in a way, it kinda is… let me explain.

This part of John’s business model is a weekly sponsorship. He sells them upfront. He can sell two on the same week. And if one becomes open suddenly

Here’s the deal: my calendar is showing some similar gaps. Now, I don’t operate exactly like John Gruber—heck, we’re not even in the same space—but at the core, the goal is the same.

And honestly, for-profit organizations aren’t as different from each other as we might like to think.

I like to imagine every organization as a magic box. It takes inputs—people, capital, resources—and transforms them into outputs: products and services that are sold to customers. Ideally, every window of time is filled with something that generates revenue for the business.

These are sold to customers. Ideally, every window of time is filled with something that generates revenue for the business.

This urgency is one reason why so many rush to create products that can “make money while you sleep.” The aim is to replace perishable inventory with non-perishable. And while that’s a great approach, it’s not what I do, because—despite what I just said—every organization is unique.

So here I am, reaching out to the internet to help fill those gaps.

If you have a business—or know someone who does—that’s struggling to boost revenue, let’s chat. I’ve got frameworks, methods, and a book that’s gotten some praise from fellow writers worldwide.

Over the years, I’ve tried it all: flywheels, funnels, category marketing, guerrilla sales, challenger sales, social selling—you name it, I’ve probably experimented with it.

Here’s the biggest takeaway:

It’s all about the synergy of teamwork and a shared purpose.

If you’re ready to turn those inputs into something extraordinary, drop me a line. Let’s make some magic happen.

This version aims for a conversational tone while still driving home the main points and inviting engagement.


And?

And?

And?


This is not going to end well. and I don’t mean posting

This is not going to end well. and I don’t mean posting to Micro.Blog

This is not going to end well.

and I don’t mean posting to Micro.Blog

Been wondering why my posts haven’t been showing up - turns

Been wondering why my posts haven’t been showing up - turns out on the switch to a new device a decision was made to post to the test blog!

Been wondering why my posts haven’t been showing up - turns out on the switch to a new device a decision was made to post to the test blog!


Simon Willison

Quoting Steve Klabnik

This tutorial exists because of a particular quirk of mine: I love to write tutorials about things as I learn them. This is the backstory of TRPL, of which an ancient draft was "Rust for Rubyists." You only get to look at a problem as a beginner once, and so I think writing this stuff down is interesting. It also helps me clarify what I'm learning to myself. — Steve Klabnik, Steve's Jujutsu Tu

This tutorial exists because of a particular quirk of mine: I love to write tutorials about things as I learn them. This is the backstory of TRPL, of which an ancient draft was "Rust for Rubyists." You only get to look at a problem as a beginner once, and so I think writing this stuff down is interesting. It also helps me clarify what I'm learning to myself.

Steve Klabnik, Steve's Jujutsu Tutorial

Tags: version-control, writing


Ollama: Llama 3.2 Vision

Ollama: Llama 3.2 Vision Ollama released version 0.4 last week with support for Meta's first Llama vision model, Llama 3.2. If you have Ollama installed you can fetch the 11B model (7.9 GB) like this: ollama pull llama3.2-vision Or the larger 90B model (55GB download, likely needs ~88GB of RAM) like this: ollama pull llama3.2-vision:90b I was delighted to learn that Sukhbinder Singh h

Ollama: Llama 3.2 Vision

Ollama released version 0.4 last week with support for Meta's first Llama vision model, Llama 3.2.

If you have Ollama installed you can fetch the 11B model (7.9 GB) like this:

ollama pull llama3.2-vision

Or the larger 90B model (55GB download, likely needs ~88GB of RAM) like this:

ollama pull llama3.2-vision:90b

I was delighted to learn that Sukhbinder Singh had already contributed support for LLM attachments to Sergey Alexandrov's llm-ollama plugin, which means the following works once you've pulled the models:

llm install --upgrade llm-ollama llm -m llama3.2-vision:latest 'describe' \ -a https://static.simonwillison.net/static/2024/pelican.jpg

This image features a brown pelican standing on rocks, facing the camera and positioned to the left of center. The bird's long beak is a light brown color with a darker tip, while its white neck is adorned with gray feathers that continue down to its body. Its legs are also gray.

In the background, out-of-focus boats and water are visible, providing context for the pelican's environment.

That's not a bad description of this image, especially for a 7.9GB model that runs happily on my MacBook Pro.

Tags: vision-llms, llm, llama, ai, edge-llms, llms, meta, ollama, generative-ai


django-plugin-django-debug-toolbar

django-plugin-django-debug-toolbar Tom Viner built a plugin for my DJP Django plugin system that configures the excellent django-debug-toolbar debugging tool. You can see everything it sets up for you in this Python code: it configures installed apps, URL patterns and middleware and sets the INTERNAL_IPS and DEBUG settings. Here are Tom's running notes as he created the plugin. Via @to

django-plugin-django-debug-toolbar

Tom Viner built a plugin for my DJP Django plugin system that configures the excellent django-debug-toolbar debugging tool.

You can see everything it sets up for you in this Python code: it configures installed apps, URL patterns and middleware and sets the INTERNAL_IPS and DEBUG settings.

Here are Tom's running notes as he created the plugin.

Via @tomviner

Tags: djp, django, plugins


John Philpin : Lifestream

“So are Sapiens and Neanderthals the same species or diffe

“So are Sapiens and Neanderthals the same species or different species? And is “species” an objective reality that biologists discover, or is it an intersubjective reality that biologists impose?” 💬 Yuval Noah Harari 🔗🖇️ Coincidentally, have been exploring this idea as applied to Mathematics.

“So are Sapiens and Neanderthals the same species or different species? And is “species” an objective reality that biologists discover, or is it an intersubjective reality that biologists impose?”

💬 Yuval Noah Harari

🔗🖇️ Coincidentally, have been exploring this idea as applied to Mathematics.

Tuesday, 12. November 2024

Simon Willison

Ars Live: Our first encounter with manipulative AI

Ars Live: Our first encounter with manipulative AI I'm participating in a live conversation with Benj Edwards on 19th November reminiscing over that incredible time back in February last year when Bing went feral. Via @benjedwards Tags: bing, generative-ai, arstechnica, ai, speaking, llms, benj-edwards

Ars Live: Our first encounter with manipulative AI

I'm participating in a live conversation with Benj Edwards on 19th November reminiscing over that incredible time back in February last year when Bing went feral.

Via @benjedwards

Tags: bing, generative-ai, arstechnica, ai, speaking, llms, benj-edwards


Qwen2.5-Coder-32B is an LLM that can code well that runs on my Mac

There's a whole lot of buzz around the new Qwen2.5-Coder Series of open source (Apache 2.0 licensed) LLM releases from Alibaba's Qwen research team. On first impression it looks like the buzz is well deserved. Qwen claim: Qwen2.5-Coder-32B-Instruct has become the current SOTA open-source code model, matching the coding capabilities of GPT-4o. That's a big claim for a 32B model that's sma

There's a whole lot of buzz around the new Qwen2.5-Coder Series of open source (Apache 2.0 licensed) LLM releases from Alibaba's Qwen research team. On first impression it looks like the buzz is well deserved.

Qwen claim:

Qwen2.5-Coder-32B-Instruct has become the current SOTA open-source code model, matching the coding capabilities of GPT-4o.

That's a big claim for a 32B model that's small enough that it can run on my 64GB MacBook Pro M2. The Qwen published scores look impressive, comparing favorably with GPT-4o and Claude 3.5 Sonnet (October 2024) edition across various code-related benchmarks:

How about benchmarks from other researchers? Paul Gauthier's Aider benchmarks have a great reputation and Paul reports:

The new Qwen 2.5 Coder models did very well on aider's code editing benchmark. The 32B Instruct model scored in between GPT-4o and 3.5 Haiku.

84% 3.5 Sonnet, 75% 3.5 Haiku, 74% Qwen2.5 Coder 32B, 71% GPT-4o, 69% Qwen2.5 Coder 14B, 58% Qwen2.5 Coder 7B

That was for the Aider "whole edit" benchmark. The "diff" benchmark scores well too, with Qwen2.5 Coder 32B tying with GPT-4o (but a little behind Claude 3.5 Haiku).

Given these scores (and the positive buzz on Reddit) I had to try it for myself.

My attempts to run the Qwen/Qwen2.5-Coder-32B-Instruct-GGUF Q8 using llm-gguf were a bit too slow, because I don't have that compiled to use my Mac's GPU at the moment.

But both the Ollama version and the MLX version worked great!

I installed the Ollama version using:

ollama pull qwen2.5-coder:32b

That fetched a 20GB quantized file. I ran a prompt through that using my LLM tool and Sergey Alexandrov's llm-ollama plugin like this:

llm install llm-ollama llm models # Confirming the new model is present llm -m qwen2.5-coder:32b 'python function that takes URL to a CSV file and path to a SQLite database, fetches the CSV with the standard library, creates a table with the right columns and inserts the data'

Here's the result. The code worked, but I had to work around a frustrating ssl bug first (which wouldn't have been an issue if I'd allowed the model to use requests or httpx instead of the standard library).

I also tried running it using the Apple Silicon fast array framework MLX using the mlx-llm library directly, run via uv like this:

uv run --with mlx-lm \ mlx_lm.generate \ --model mlx-community/Qwen2.5-Coder-32B-Instruct-8bit \ --max-tokens 4000 \ --prompt 'write me a python function that renders a mandelbrot fractal as wide as the current terminal'

That gave me a very satisfying result - when I ran the code it generated in a terminal I got this:

MLX reported the following performance metrics:

Prompt: 49 tokens, 95.691 tokens-per-sec Generation: 723 tokens, 10.016 tokens-per-sec Peak memory: 32.685 GB

Let's see how it does on the Pelican on a bicycle benchmark.

llm -m qwen2.5-coder:32b 'Generate an SVG of a pelican riding a bicycle'

Here's what I got:

Questionable Pelican SVG drawings aside, this is a really promising development. 32GB is just small enough that I can run the model on my Mac without having to quit every other application I'm running, and both the speed and the quality of the results feel genuinely competitive with the current best of the hosted models.

Given that code assistance is probably around 80% of my LLM usage at the moment this is a meaningfully useful release for how I engage with this class of technology.

Tags: open-source, ai, generative-ai, edge-llms, llms, ai-assisted-programming, llm, uv, qwen, mlx, ollama


IdM Laboratory

パスキーのテストサイトがリニューアル

こんにちは、富士榮です。 以前も何度かWebAuthnのテストをするためのサイトを紹介しましたが、今回Okta(Auth0)がパスキー学習サイトをリニューアルしてきたので試してみます。 https://learnpasskeys.io/ja なんと日本語に対応しています。 ユーザIDやユーザ名など登録に使う情報はあらかじめ決まっていますが、デモとして試すこともできますし、API仕様などを含むリソースもまとまっています。 スクラッチで実装する人も、パスキーを基礎から学習したい人にもとても良いサイトなのでぜひアクセスしてみると良いと思います。
こんにちは、富士榮です。
以前も何度かWebAuthnのテストをするためのサイトを紹介しましたが、今回Okta(Auth0)がパスキー学習サイトをリニューアルしてきたので試してみます。
https://learnpasskeys.io/ja

なんと日本語に対応しています。
ユーザIDやユーザ名など登録に使う情報はあらかじめ決まっていますが、デモとして試すこともできますし、API仕様などを含むリソースもまとまっています。
スクラッチで実装する人も、パスキーを基礎から学習したい人にもとても良いサイトなのでぜひアクセスしてみると良いと思います。


Doc Searls Weblog

Remembering Paul Marshall

In a vote for “Senior Superlatives” among his 36 classmates at Concordia Prep, Paul Marshall won in several categories. The yearbook staff, however, limited the Superlative distinction to one per student, and Paul chose to be recognized for his wit, which was boundless. He was also the editor-in-chief of The Prepster, our student paper, because […]

In a vote for “Senior Superlatives” among his 36 classmates at Concordia Prep, Paul Marshall won in several categories. The yearbook staff, however, limited the Superlative distinction to one per student, and Paul chose to be recognized for his wit, which was boundless. He was also the editor-in-chief of The Prepster, our student paper, because he was the best writer. He was the best musician, playing the organ in our twice-daily chapel services, and sousaphone in the school band. He even taught sophomore biology when the teacher was sick for a few weeks.

Once in a science class, a teacher asked Paul to describe a natural law. His response was, “Matter can be neither created nor destroyed. It can only be eaten.”

Concordia Prep was a Lutheran seminary for high school boys and an academic correctional facility for lousy students like me. In his autobiography, AJ Ontko said I would have been voted Least Likely to Achieve, had that been an option. Yet, after an academically and socially miserable year as a sophomore, Paul recruited me to be his roommate for the following year, and we stayed together until we graduated. During that whole time, my life improved in every way, thanks in large measure to Paul.

Take music. Paul taught me volumes about Bach, Purcell, Händel, and other composers, such as Peter Schickele (P.D.Q. Bach) and Tom Lehrer. He tutored through performance, which involved runs by train to New York City. One especially memorable trip involved Clarke & Purcell’s Trumpet Voluntary, performed on the massive pipe trumpets in the Cathedral of St. John the Devine. (A sample.)

Paul also taught me to believe in myself.  From a post a couple of months back:

I remember a day when a bunch of us were hanging in our dorm room, talking about SAT scores. Mine was the lowest of the bunch. (If you must know, the total was 1001: a 482 in verbal and a 519 in math. Those numbers will remain burned in my brain until I die.) Others, including Paul, had scores that verged on perfection—or so I recall. (Whatever, they were all better than mine.). But Paul defended me from potential accusations of relative stupidity by saying this: “But David has insight.” (I wasn’t Doc yet.) Then he gave examples, which I’ve forgotten. By saying I had insight, Paul kindly and forever removed another obstacle from my path forward in life. From that moment on, insight became my stock in trade. Is it measurable? Thankfully, no.

After high school, Paul went on to distinguished careers as a liturgical historian, a parish pastor, a professor at the Yale Divinity School, Bishop of the Episcopal Church’s Bethlehem Diocese in Pennsylvania, and an activist for progressive causes (notably in his church). Among other things. Our contacts through all of that were infrequent but always interesting and fun.

Of no importance but some interest is a moment we shared walking on a New York street at night with our young sons. One of the boys, looking at the moon, asked if was full. In unison and without thinking, Paul and I both replied, “No, it’s empty.”

In a likewise manner, we agreed to speak at each others’ funerals. Alas, we both missed that chance. Paul died of cancer on October 21, and his funeral was last Friday when I was in France.

Here is the Church’s obituary. Paul was 77, four days older than me.

My joke about mortality is that I know I’m in line for the exit, but I let others cut in. I wish Paul hadn’t.

Requiescat in pace, old friend. And forgive me for putting off our next meeting.

 


Ben Werdmüller

The corner café

We’ve got a new takeaway in the neighborhood, occupying a small space that was previously taken up by a pizza joint. Their opening was a bit of a saga — it seemed like they were waiting an age for permits to be approved — so it’s exciting to have them finally open. I was also really intrigued: the owner is an award-winning chef who spends significant time supporting abused children. At the same

We’ve got a new takeaway in the neighborhood, occupying a small space that was previously taken up by a pizza joint. Their opening was a bit of a saga — it seemed like they were waiting an age for permits to be approved — so it’s exciting to have them finally open.

I was also really intrigued: the owner is an award-winning chef who spends significant time supporting abused children. At the same time, their Instagram feed seemed to just be stock images of food, while there didn’t seem to be rhyme or reason to the enormous menu. Cheesesteaks sat alongside gumbo and salmon with mango salsa. I couldn’t begin to tell you what the through line was: everything seemed to have been chosen in the spirit of, “maybe they’ll buy this?”

I walked in on opening day to buy two burgers as a test order. It was chaos: the room was filled with black smoke, every surface in the kitchen was covered in ingredients that ran the gamut from spices to ready-made Belgian waffle mix, and laundry baskets full of fruit and vegetables sat on the floor. I counted seven people working in a space the size of a corner store.

Every meal seemed to be cooked one at a time. My burgers took 25 minutes to cook; they were largely ungarnished and incredibly expensive for what they were. One of them had a hair in it. The fries were pretty good but had been over-salted. The woman at the counter made sure to ask me to come back with feedback as they were just getting started.

My review so far might seem unkind, but I really want them to succeed. It would be lovely to have some decent food just down the street; while Philadelphia is rightly known for excellent restaurants, my part of the suburbs is not. So I was dismayed to read Instagram updates from the restaurant that apologized for long wait times, as well as a confusing update that I think was about not being open on Veteran’s Day and offsetting their menu options by a day as a result — though there’s no daily rotation to adjust, as far as I could tell.

Today’s update was that they’d hired more staff to make things go faster, and I’ve never seen a software analogy write itself so neatly and so clearly before. If you’ve ever been on a project that’s been falling behind, you’ll know that adding more people is a great way to fall further behind. Hiring more people when a team is struggling is often like trying to untangle a knot by adding more hands: without clear roles, it only gets messier.

I really want to tell them to focus on a small number of menu items, aimed at a specific community that they want to serve, and only grow once they’ve hit menu / eater fit. Clean out the kitchen, retain a smaller but well-trained staff, and design processes to get that food made repeatedly well in an acceptable time, and at a price that makes sense. Launching with a “this is for everyone!” mentality practically ensures that you’re releasing something for nobody at all.

It’s not my place, of course, although I probably will send them some gentle feedback about the hair in the burger at the very least. I genuinely hope they do succeed; I would like to eat there regularly. I’m cheering for them. But for now, from afar.


The Pragmatic Engineer

How to debug large, distributed systems: Antithesis

A brief history of debugging, why debugging large systems is different, and how the “multiverse debugger” built by Antithesis attempts to take on this challenging problem space

Debugging is one of those things all engineers do, but little has changed in how we debug for decades. For example, debugging by printing to the console output or by logging is still pretty common, even though there’s decent debuggers that can be used across IDEs.

Believe it or not, some debugging tools today are actually less advanced than in the old days. Steve Yegge, head of engineering at Sourcegraph – said last year:

“I saw the best debugger I’ve ever used at GeoWorks, in 1990. To this day, I’ve yet to see a debugger do what theirs did back then: path choice on the fly, undo on the spot, or step an instruction backwards.”

This stagnant rate of progress makes it very interesting that there’s a small engineering team working today on building a much better debugging tool, which specifically focuses on debugging large and distributed systems. It’s called Antithesis, and is the focus of this article. 

Today, we cover:

Brief history of debugging

Antithesis’ “multiverse debugger”

Q&A with Antithesis co-founder, Will Wilson

Tech stack

Engineering team and culture

Advanced testing tools for better bug management

Better bug management with advanced testing tools

Tradeoffs of complexity

As always with these deep dives about a vendor, this publication has no affiliation with Antithesis, and was not paid for this article. Check out our ethics policy.

1. Brief history of debugging

Debugging and software development have gone hand in hand since the earliest days of computing. But why do we call it ‘debugging’? The etymology is a bit obscure, but it could include a real-life insect.

First “debugged” computer

In 1947, a team of scientists at Harvard University including computer science pioneer, Grace Hopper, found a moth trapped in a relay of the Mark II mainframe computer which was causing it to malfunction. The fault was documented, and the moth itself was added to a hand-written record, reading: “...first actual case of bug being found.”

First-ever “bug” report? Source: IPSJ Computer Museum and Naval History and Heritage Command

Faults were called “bugs” before this incident, but the serendipitous episode may have helped cement the term “debugging” in the lexicon. Several computer science papers from the 1950s mention “debugging” in passing, which suggests the word was in use and its meaning was common knowledge among professionals. It also appears in the 1963 manual of the first time-sharing operating system, the Compatible Time-Sharing System (CTSS.)

Evolution of debugging tools

Programmers have always built tools to make their lives easier, and debuggers are a case in point. Here’s how the toolset evolved from the 1960s.

1960s: punch card era. The earliest debugging tools:

Typewriter debugging: DEC Debugging Tape (DDT): a debugger program that worked on a typewriter! It allowed the modifying of a program or its data on the mainframe, while it was running. (DEC stands for “Digital Equipment Corporation”, which was a major computer hardware company of the time.) This was an era when the typewriter served as the command line interface. See the full manual for commands

Online Debugging Tool (ODT): a family of debugger programs that allowed the accessing of memory using octal addresses while software ran. Also produced by the Digital Equipment Corporation.

1970s: (symbolic) debuggers. New, powerful programming languages like C, FORTRAN and COBOL were developed in the ‘70s, which allowed fetching of symbol maps that showed the memory addresses of variables. Symbol maps were used for more efficient debugging, as they made it unnecessary to manually track memory addresses. The tools in use today are symbolic debuggers.

Late 1970s: breakpoints. With the ability to inspect the memory of a running program and to get a memory dump, the next debugging task is to halt program execution on a given condition, like a variable reaching a certain value. Breakpoints allow for precisely that. 

The core functionality of halting program execution emerged in the 1940s, with involved approaches like removing cables, deliberately causing program crashes, and via hardware switches. Over time, the utility and usability of breakpoints evolved, and by the end of the ‘70s, they were in symbolic debuggers in ways recognisable today. More advanced tools added the option of allowing a program to advance one step forward (step forward) and the more complex functionality of going back (step back.)

Mid-1980s: “modern debugging.” From the 1980s, the software development experience continued to evolve with better terminals, more interactivity, and ever-tighter feedback loops. Debugging improvements followed a similar pattern. For example, in 1983 Turbo Pascal introduced its IDE with built-in debugging capabilities – which might have been the first “mainstream” IDE with debugging enabled. 

Graphic debugging tools with visual breakpoints and output were innovations of this time. Remote debugging – debugging programs running over networks – became possible with the spread of the internet.

Today’s modern debugging tools have modern features, such as:

Time-travel debugging. Also known as “reverse debugging”, this is most common within functional programming and in deterministic environments. It allows recreating issues, and to “step backwards” to figure out root causes. Today’s deep dive is on one such tool, Antithesis.

Record and replay debugging. The application state is recorded in each step of the process and can be replayed. Recordings tend to include memory state, memory interactions, inputs, and system resource status markers, among others. ReDevBug does this.

Automatic debugging. Tools that can automatically locate and sometimes fix bugs in code. These debuggers are usually ML or AI-driven.

AI-assisted debugging. The latest wave of debugging uses GenAI to predict and locate bugs in a more efficient manner. It’s early days, but we can expect more solutions like this.

2. Antithesis’s ‘multiverse debugger’

Antithesis was founded in 2018 with the vision of a better way to test systems, and it has raised an impressive $47M (!!) in seed funding. The business model is usage-based pricing, based on the number of CPUs used for testing activities; a good analogy is Amazon renting out its EC2 servers. Today, Antithesis sells cores on an annually-reserved basis, with a minimum for getting started with, and hopes to offer more flexibility in the future, I’m told.

Time-travel debugging tools are usually limited to functional languages where state management is simple, or to deterministic environments; like in well-defined sandboxes. For most real-world programs, no time travel option is available for debugging, so when a backend service crashes non-deterministically, there’s no good way to turn back time and investigate it; the best option is usually to add more logging to help explain future crashes.

Building a time machine

The Antithesis team spent several years building a system that acts like a time machine. It wraps your current system, and lets you rewind your steps. Within the “wrapper”, to rewind the state of the system to 5 seconds earlier, you type: 

branch = branch.end.rewind(5).branch 

Files deleted within the last five seconds come back, including if deleted permanently without being put in deleted file storage. Any changes made in files since are also undone.

Creating the time machine means creating a deterministic simulation, which can progress from its starting point to the future, arbitrarily. It can go back in time, too, which raises interesting possibilities. For example, if your server crashed: wouldn’t it be great to “rewind” time and attach a debugger? In a simulated system, you can do this: simulate the system to the point where the process will crash, then add a debugger or export a memory dump. Similarly, if a user reports that their session was slow: it’s now possible to go “back in time” by recreating their session, and attaching a debugger.

Having a deterministic simulator creates previously hard-to-achieve scenarios, such as:

Jump into the future; for example, by fast-forwarding a system 10 hours in the future, to inspect how memory usage and CPU usage will trend at that time. This is not a prediction, but it allows actually inspecting the future state of the system!

Generate more logs to work with. When a suspicious event is identified in the present, you can go back in time and add more logging to where this event may have originated from. You could also enable detailed logging across the system for a few minutes or seconds before an event occurs.

Change the past. Go back to before a crash happened, and change the code executing.

What Antithesis does

Antithesis is not only a time-traveling debugger, though. A good way to describe it is as “Deterministic Simulation Testing (DST) as a service.”

Deterministic Simulation Testing (DST) is a technique of building a simulation in which software can run in a single thread, and where you’re in control of all variables like time, randomness, etc., in order to achieve determinism during testing.

What is DST? One explanation

DST is a combination of:

Fuzzing: also referred to as “fuzz testing,” this is automated testing that inputs invalid, unexpected, or random inputs to a program.

Assertions: making logical statements that should always be true or false, and breaking the program when an assertion fails; e.g.; asserting that an integer variable’s value is always greater than zero, so the program breaks when this condition fails.

Shotgun debugging: making random changes to software and seeing if it fixes the bug. 

Time travel debugging: the ability to “step backward and forward in time,” within the state of the program.

Doing Deterministic Simulation Testing is really hard for any system because you have to build everything from scratch. No existing frameworks and libraries without support for all time-traveling, debugging, fuzzing, etc, can be used. One of the first “proper” usages of DST was within the distributed database, FoundationDB, one of whose creators is Antithesis cofounder, Will Wilson.

Because implementing DST is so difficult, Antithesis made the computer/hypervisor deterministic, instead. This means anything that runs on this Antithesis computer/hypervisor can be tested with DST, without doing everything yourself.

And thanks to running a fully deterministic environment, Antithesis can manipulate it into weird states on purpose, which allows developers to inspect weird states and bugs to find out their causes. Read more on how Antithesis works.

3. Q&A with Antithesis co-founder, Will Wilson

The company’s CEO took some questions from us, and in this section the questions are italicized, with Will’s responses in normal text.

How debugging large systems is different

The Antithesis tool was built to debug large and complex systems, but how are these systems different from common ones like single services, apps, and single-threaded websites?

‘A few things make large systems different:

Bizarre failures are a certainty. If your software runs on one computer, things like bitflips in memory, or corruption on disk are exceptionally uncommon. If your software runs on a cluster with tens of thousands of machines, you’d better have a plan for it.

Expectations are usually higher. If your software runs on one computer, and it crashes, there’s not a lot your software can do to improve the situation, except not losing any durable state. But if your software runs on a large cluster, people probably expect it to function if one or two machines die.

Concurrency plays a bigger role. You can get in trouble with multi-threaded systems on a single machine, but with many machines and unreliable/noisy networks between them, it gets so much worse.

Timestamps are meaningless. Unless you’re Google and have atomic clocks in your datacenters, you need to assume that clocks on different machines are not perfectly synchronized, which can make reading logs very confusing. You may literally not know whether an event started on system A or system B!

Large systems probably don’t “fit inside the head” of any person, which can make reasoning through the state machine the old fashioned way, much harder. Also, the sheer length of time and numbers of people it takes to build these systems, means there are many opportunities to lose institutional knowledge, or memories to fade.’

‘All of the above make testing and debugging large systems much harder, especially the first three points. Many failure modes of large-scale systems are “external” or environmental, and to do with hardware faults, network messages getting delayed, or weird pauses on a thread. These are harder to reason about in advance, and they’re monumentally harder to test for and debug, as they may depend on highly specific conditions or timings that are almost impossible to reproduce on demand.

‘The paradox of distributed systems is that a one-in-a-million bug can be a huge urgent problem because you’re processing millions of requests all the time, so the bug will occur frequently. But, it’s still a one-in-a-million bug, so a test probably won’t reproduce it!’

How Antithesis is used

Where does Antithesis fit into customers’ software delivery and process timelines?

‘We see customers using Antithesis in very different ways. There are teams who run short tests on almost every PR, or who run long tests overnight or weekends, and some teams only pull it out when trying to track down a really crazy bug. 

‘We don’t tell our customers they should eliminate any of their existing tests because it’s probably inexpensive to keep them, and we don’t want to be the cause of any outage or emergency. That said, many customers stop investing as much in non-Antithesis tests, and instead try to find ways to use our platform for as much testing as possible.

‘Some customers have come up with really creative ways to use our platform. For example, who said this tool can only be used for hunting bugs? It’s a general platform for looking for any behavior in software systems. For example it can help answer questions like:

“Can function A ever run before function B? Or does this function ever get called with a negative parameter?”

‘Most of what Antithesis “replaces” is human effort of the really annoying, unpleasant kind, like adding logging, then waiting for it to happen again in production. Or designing weird, ad-hoc fault injection systems in end-to-end tests. Or writing a script to run an integration test dozens of times to chase down an intermittent problem that only occurs once every ten runs. Basically, the stuff no programmer enjoys doing.’

4. Tech stack

What is the tech stack behind Antithesis? DST is hard to do with existing libraries, so which frameworks you might use instead of writing a bespoke one?

‘We have a pretty bad case of “not-invented here” syndrome. Basically, compared to most companies, we see a larger cost to adopting lots of third-party dependencies. So we bias towards building tools in house that do exactly what we need, which means our tech stack is very “home-grown”.

‘Languages we use often:

C and C++: languages with low-level memory manipulation, helpful for high-performance scenarios and necessary for kernel-mode code

Rust: a modern programming language emphasizing performance, type safety and concurrency

Typescript: a language adding static typing for JavaScript. Popular across backend and frontend domains

Nix: a language to create derivations (precise descriptions of how contents of existing files are used to derive new files)

‘Our major dependencies:

Nix/NixOS: a tool for package management and system configuration

BigQuery: a managed serverless data warehouse product by Google

Hypervisor: we use a fork/rewrite of the FreeBSD kernel hypervisor, bhyve.

‘Our homegrown stack is surprisingly large!

Hypervisor: custom-built for our needs; more details here.

A fully-reactive browser-based Javascript notebook. It has sophisticated dependency tracking. We currently use it to deliver the multiverse debugging experience

Fuzzer: optimized for exploring the state space of interactive programs (read more about fuzzing)

Fault injector: a testing tool to deliberately introduce failures, errors or problematic conditions

Binary instrumentation for customer software: inserting additional code (instrumentation code) into a customer’s compiled program to analyze its behavior during runtime.

Customizable Linux environment: what customers’ software run in

Build system: based on Nix, which glues our systems together

Infrastructure and security mechanisms, built to ensure we run a trusted computing base

‘Our homegrown stack is huge! One of the coolest things about working at Antithesis as an engineer is that if there’s any computer science topic you’re interested in, there’s a good chance we do it, at least a little.

Building a database

‘We started using BigQuery very early because the pricing model is unbeatable for a tiny startup with bursty workloads. But the data model did not make much sense for us. 

‘Our use case is to analyze ordered streams of events: logs, code coverage events, etc. But because we have a deterministic multiverse which can fork, the stream of events form a tree structure rather than a single linear history! But BigQuery is not well set up to handle trees of events, and neither is any other SQL database. 

‘We managed to putter along for a while with crazy hacks. For instance, we built a new data structure called a "skip tree", inspired by the skip list which we implemented in SQL. This data type greatly improved the asymptotic performance of our queries (the performance characteristics at scale). However, we eventually got to the point of regularly crashing BigQuery's planner; at which point we knew we had to move to something else. 

‘We evaluated Snowflake, and its Recursive CTE feature, and also evaluated a large number of other SQL and NoSQL databases, but nothing fundamentally fixed the problem.

‘We were hesitant to build our own database for ages, until a company hackathon where a team tried writing a proof-of-concept analytic database for folding Javascript functions up and down petabyte-scale trees, thrown together in a week using Amazon S3 and Lambda. It actually worked! 

‘We're cautious, and a lot of people on our team have built databases before. We know that the hardest part of building a database is not getting started, but towards the end of the project with testing and operationalizing. But we do have this really great technology for testing distributed systems! 

‘We decided to write a custom database for our needs, 100% tested with Antithesis. We would have no other test plan except for running it with Antithesis! We are now nearing the end of the project, and so far, it’s going well!

‘If we succeed, it would solve a huge number of production issues with BigQuery, and enable us to launch some amazing new features. Plus, this project gives us the ultimate empathy with customers. 

5. Engineering team and culture

Tell us about the engineering team’s values and practices.

Read more


Ben Werdmüller

I’m a neurology ICU nurse. The creep of AI in our hospitals terrifies me

[Michael Kennedy and Isobel Cockerell in Coda Story] Just one example of many of AI being used to take agency away from ordinary workers: "The upshot was, it took away our ability to advocate for patients. We couldn’t point to a score and say, ‘This patient is too sick, I need to focus on them alone,’ because the numbers didn’t help us make that case anymore. They didn’t tel

[Michael Kennedy and Isobel Cockerell in Coda Story]

Just one example of many of AI being used to take agency away from ordinary workers:

"The upshot was, it took away our ability to advocate for patients. We couldn’t point to a score and say, ‘This patient is too sick, I need to focus on them alone,’ because the numbers didn’t help us make that case anymore. They didn’t tell us if a patient was low, medium, or high need. They just gave patients a seemingly random score that nobody understood, on a scale of one to infinity.

We felt the system was designed to take decision-making power away from nurses at the bedside. Deny us the power to have a say in how much staffing we need."

The piece goes on to discuss the mass surveillance that AI enables. In a world where a patient's discussions with the healthcare workers attending to them are recorded, to feed an agent or otherwise, all kinds of abuses become possible. Not only does it remove agency from the experts who should be advocating for patients, but consider the effects in a state with adverse reproductive healthcare laws, for example.

This is the salient point:

"The reasoning for bringing in AI tools to monitor patients is always that it will make life easier for us, but in my experience, technology in healthcare rarely makes things better. It usually just speeds up the factory floor, squeezing more out of us, so they can ultimately hire fewer of us."

And this tends to be true regardless of what the original intention might be. If a technology can be used to cut costs or squeeze more productivity out of a worker, absent of any other constraints, it absolutely will be. In healthcare, like many fields that depend on care, attention, and underlying humanity, that's not necessarily a good thing.

#AI

[Link]


John Philpin : Lifestream

🔗 Learn the Order of Adjectives in English. Not sure I eve

🔗 Learn the Order of Adjectives in English. Not sure I ever ‘learned’ this - but intuitively it is what I do.

🔗 Learn the Order of Adjectives in English.

Not sure I ever ‘learned’ this - but intuitively it is what I do.


Over on Substack I published 🔗🖇️ this … At first blush,

Over on Substack I published 🔗🖇️ this … At first blush, it might seem to be about maths and it certainly connects, but it’s really about those who challenge the status quo, see something different, push boundaries, which, not coincidentally is exactly what great startups do. Love to know what you all think? 🔗🔎 BusinessBytes | 🔗 More

Over on Substack I published 🔗🖇️ this

At first blush, it might seem to be about maths and it certainly connects, but it’s really about those who challenge the status quo, see something different, push boundaries, which, not coincidentally is exactly what great startups do.

Love to know what you all think?

🔗🔎 BusinessBytes | 🔗 More


Must be something about the beach today - 40 or so walkers s

Must be something about the beach today - 40 or so walkers suddenly appeared on our beach - a number of them armed with what I call ‘Alpine Sticks’ …

Must be something about the beach today - 40 or so walkers suddenly appeared on our beach - a number of them armed with what I call ‘Alpine Sticks’ …


Ben Werdmüller

Congress About to Gift Trump Sweeping Powers to Crush Political Enemies

[Noah Hurowitz at The Intercept] This should be a five alarm fire: "Up for a potential fast-track vote next week in the House of Representatives, the Stop Terror-Financing and Tax Penalties on American Hostages Act, also known as H.R. 9495, would grant the secretary of the Treasury Department unilateral authority to revoke the tax-exempt status of any nonprofit deemed to be

[Noah Hurowitz at The Intercept]

This should be a five alarm fire:

"Up for a potential fast-track vote next week in the House of Representatives, the Stop Terror-Financing and Tax Penalties on American Hostages Act, also known as H.R. 9495, would grant the secretary of the Treasury Department unilateral authority to revoke the tax-exempt status of any nonprofit deemed to be a “terrorist supporting organization.” [...] The law would not require officials to explain the reason for designating a group, nor does it require the Treasury Department to provide evidence."

Unbelievably, this is a bipartisan bill, despite its obviously harmful effects: if any non-profit can be stripped of its status without reason or evidence, the ability for an adverse administration to do harm with it is huge.

Non-profits one can imagine being affected include those reporting the news, providing reproductive healthcare, supporting vulnerable communities, aiding immigrants at risk of deportation, providing aid in places like Gaza, and more. It's a blank check to harm political opponents - and it seems ludicrous that it's on the verge of passing.

#Democracy

[Link]


John Philpin : Lifestream

Oh .. well why didn’t you say? …… Experimenting.

Oh .. well why didn’t you say? …… Experimenting.

Oh .. well why didn’t you say? ……

Experimenting.


Ben Werdmüller

How I ship projects at big tech companies

[Sean Goedecke] This is very good. It's advertised as a piece about shipping in big tech companies, but honestly, I think it's true of many smaller companies too. It's not true in the smallest startups or for organizations with certain kinds of engineering cultures - but I suspect they may be in the minority. "What does it mean to ship? It does not mean deploying code or eve

[Sean Goedecke]

This is very good. It's advertised as a piece about shipping in big tech companies, but honestly, I think it's true of many smaller companies too. It's not true in the smallest startups or for organizations with certain kinds of engineering cultures - but I suspect they may be in the minority.

"What does it mean to ship? It does not mean deploying code or even making a feature available to users. Shipping is a social construct within a company. Concretely, that means that a project is shipped when the important people at your company believe it is shipped."

Software engineering isn't a technology business: it's a people business. You're building tools that solve real problems for real people, and you're doing it inside an organizational structure that is also made of real people. There's no way to get around this: unless the organization is exceptionally organized around engineering needs (which many small and medium tech companies are!), you will have to navigate these sorts of interpersonal dynamics.

This hits the nail on the head for just about everybody:

"I think a lot of engineers hold off on deploys essentially out of fear. If you want to ship, you need to do the exact opposite: you need to deploy as much as you can as early as possible, and you need to do the scariest changes as early as you can possibly do them."

It seems counterintuitive, but again: if your goal is to ship (and it probably should be), you need to focus on doing that.

#Technology

[Link]

Monday, 11. November 2024

Simon Willison

How I ship projects at big tech companies

How I ship projects at big tech companies This piece by Sean Goedecke on shipping features at larger tech companies is fantastic. Why do so many engineers think shipping is easy? I know it sounds extreme, but I think many engineers do not understand what shipping even is inside a large tech company. What does it mean to ship? It does not mean deploying code or even making a feature available

How I ship projects at big tech companies

This piece by Sean Goedecke on shipping features at larger tech companies is fantastic.

Why do so many engineers think shipping is easy? I know it sounds extreme, but I think many engineers do not understand what shipping even is inside a large tech company. What does it mean to ship? It does not mean deploying code or even making a feature available to users. Shipping is a social construct within a company. Concretely, that means that a project is shipped when the important people at your company believe it is shipped.

Sean emphasizes communication, building confidence and gaining trust and the importance of deploying previews of the feature (for example using feature flags) as early as possible to get that crucial internal buy-in and feedback from other teams.

I think a lot of engineers hold off on deploys essentially out of fear. If you want to ship, you need to do the exact opposite: you need to deploy as much as you can as early as possible, and you need to do the scariest changes as early as you can possibly do them. Remember that you have the most end-to-end context on the project, which means you should be the least scared of scary changes.

Via Hacker News

Tags: software-engineering, management, feature-flags


Hyperonomy Digital Identity Lab

Decentralized Identifier (DID Method) Clusters: Spec Development Framework

Copyright (c) 2018-2024 Michael Herman (Alberta, Canada) – Creative Commons Attribution-ShareAlike 4.0 International Public Licensehttps://creativecommons.org/licenses/by-sa/4.0/legalcode 0. Prime Objective/Directive … the prime objective needs to be driving increased demand for and consumption of Eggs – aka DIDs (and VCs) – by … Continue reading →

Copyright (c) 2018-2024 Michael Herman (Alberta, Canada) – Creative Commons Attribution-ShareAlike 4.0 International Public License
https://creativecommons.org/licenses/by-sa/4.0/legalcode

0. Prime Objective/Directive

… the prime objective needs to be driving increased demand for and consumption of Eggs – aka DIDs (and VCs) – by Holders. …creating hundreds of thousands of ways that drive more Holders to consume more Eggs.

… are great examples of driving the demand for and consumption of more and more digital credentials [and DIDs] (eggs); and secondarily, the demand for hens and roosters (Issuers and Verifiers). The demand for eggs drives the production of hens; and in turn, the demand for roosters. Don’t mess with #MotherNature

Reference: https://www.linkedin.com/posts/mwherman_the-classic-chicken-and-egg-problem-verifiers-activity-7249162946709241856-lxaS?utm_source=share&utm_medium=member_desktop

1. Decentralized Identifiers (DIDs) v1.0 Specification

Reference: https://www.w3.org/TR/did-core/

Decentralized identifiers (DIDs) are a new type of identifier that enables verifiable, decentralized digital identity. A DID refers to any subject (e.g., a person, organization, thing, data model, abstract entity, etc.) as determined by the controller of the DID. In contrast to typical, federated identifiers, DIDs have been designed so that they may be decoupled from centralized registries, identity providers, and certificate authorities.

DID subject
The entity identified by a DID and described by a DID document. Anything can be a DID subject: person, group, organization, physical thing, digital thing, logical thing, etc.

2. Use Cases and Requirements for Decentralized Identifiers Document

Reference: https://www.w3.org/TR/did-use-cases/#uc

3. Subsidiary Ledgers / Digital Wallets 4. Trusted Digital Web (TDW) 5. LinkedIn Economic Graph

Reference: https://www.youtube.com/watch?v=xwawTzMfHic&list=PLU-rWqHm5p45dzXF2LJZjuNVJrOUR6DaD&index=23

6. Toronto Musican’s Economic Graph 7. Enteprise Architecture 8. Web 7.0 / TDW DID Method Clusterization Proposal

GitHub: https://github.com/decentralized-identity/did-methods/issues

Universal DID Methods

DID methods suitable for interacting with Every Little Thing (#ELT) on the planet/in the universe

did:object (http://didobject.directory) did:ns (http://didns.directory) did:web7 Web 7.0/TDW DID Method Taxonomy 0.1 (Solution Concept) Bold method is the model method or exemplar for the particular class. A method can be a exemplar for 1 or many classes. This list of DID method categories is just an example. A complete taxonomy will like be 2-3 level hierarchy. The parent categories for these examples might include: Live Things, Inanimate Things, Abstract Things, Digital Things, Business Things, etc. etc. More than one exemplar per class is permitted. 9. Process Framework: Sociocracy

Reference: https://github.com/decentralized-identity/did-methods/issues/4

In Sociocracy terminology, a mini-WG is called a circle. Each category of DID methods (cluster of DID Methods) would be managed by its own independent circle. A circle member can belong to more than 1 circle. Circles are connected to a parent circle for administrative purposes. The parent circle would correspond to the DID Method WG (co-chaired by Markus).

Sociocracy: https://www.sociocracyforall.org/content/

Sociocracy combines consent decision-making, a decentralized system of authority and intentional processes to improve our decisions and processes over time into a governance system that supports effective and efficient process while increasing connection, listening and co-creation among members.

Sociocracy is used in businesses, communities, nonprofits, cooperatives, grassroots groups and in education. 

10. Questions

IdM Laboratory

iddanceイベントが開催されます。今回はVCがテーマ!

こんにちは、富士榮です。 年末です。iddanceの季節です(嘘)。 https://idance.connpass.com/event/336798/ テーマは「VCに未来がないって聞いたんですけど?」とのこと。なかなか刺激的です。 元ネタはこれですねw 崎村さんのBlog https://www.sakimura.org/2024/11/6488/ 今回はフルオンラインなので参加しやすいですね。 ぜひ参加しましょう。

こんにちは、富士榮です。


年末です。iddanceの季節です(嘘)。


https://idance.connpass.com/event/336798/

テーマは「VCに未来がないって聞いたんですけど?」とのこと。なかなか刺激的です。

元ネタはこれですねw

崎村さんのBlog

https://www.sakimura.org/2024/11/6488/


今回はフルオンラインなので参加しやすいですね。

ぜひ参加しましょう。


Ben Werdmüller

Improving Private Signal Calls: Call Links & More

[Signal] Signal has improved its group call functionality pretty significantly: "If you love group calls on Signal, but don’t want to create a group chat for every combination of your friends or colleagues, you’re in luck. Today we’re launching call links: Share a link with anyone on Signal and in just a tap or click they can join the call. No group chat required." This is

[Signal]

Signal has improved its group call functionality pretty significantly:

"If you love group calls on Signal, but don’t want to create a group chat for every combination of your friends or colleagues, you’re in luck. Today we’re launching call links: Share a link with anyone on Signal and in just a tap or click they can join the call. No group chat required."

This is good news, and brings Signal in line with other videoconferencing software. These calls include hand raising, reply emoji, and the other functionality you'd expect to see elsewhere - while being end to end encrypted.

I'm hoping this is a prelude to even more group / workspace functionality. The blog post mentions that Signal's own meetings are Signal-powered (as they should be!), and it's a hop, skip, and a jump from there to powering internal chat with it, too.

This would be a game-changer for any organization that needs to maintain secure comms. It's also a good idea for anyone who conducts regular calls or chats in a group.

Signal is free and open source, is always end-to-end encrypted, and can be downloaded on every major platform.

#Technology

[Link]


Simon Willison

Binary vector embeddings are so cool

Binary vector embeddings are so cool Evan Schwartz: Vector embeddings by themselves are pretty neat. Binary quantized vector embeddings are extra impressive. In short, they can retain 95+% retrieval accuracy with 32x compression and ~25x retrieval speedup. It's so unintuitive how well this trick works: take a vector of 1024x4 byte floating point numbers (4096 bytes = 32,768 bits), turn th

Binary vector embeddings are so cool

Evan Schwartz:

Vector embeddings by themselves are pretty neat. Binary quantized vector embeddings are extra impressive. In short, they can retain 95+% retrieval accuracy with 32x compression and ~25x retrieval speedup.

It's so unintuitive how well this trick works: take a vector of 1024x4 byte floating point numbers (4096 bytes = 32,768 bits), turn that into an array of single bits for > 0 or <= 0 which reduces it to just 1024 bits or 128 bytes - a 1/32 reduction.

Now you can compare vectors using a simple Hamming distance - a count of the number of bits that differ - and yet still get embedding similarity scores that are only around 10% less accurate than if you had used the much larger floating point numbers.

Evan digs into models that this works for, which include OpenAI's text-embedding-3-large and the small but powerful all-MiniLM-L6-v2.

Via lobste.rs

Tags: ai, embeddings


@_Nat Zone

デジタルアイデンティティ:世界の最新動向 –11月12日版

ポルトガルにおけるデジタル本人確認のパートナーシップ IDnow Trust Servicesの欧州設立 インドにおける本人確認の革新 ジャック・ドーシー率いるBlockが、Web5を推進する参加企業tbdの縮小を発表 ジャック・ドーシーの会社であるBlockは、ビットコインマイ…
ポルトガルにおけるデジタル本人確認のパートナーシップ iProovとBiometridがデジタル本人確認ソリューションの強化のために協力 iProovの顔認証技術とBiometridのアイデンティティ統合プラットフォームを組み合わせ 政府および金融サービス分野を対象に、不正対策とユーザー体験の向上を目指す Biometridのプラットフォームは書類検証、顔照合、ビデオ会議機能を搭載 iProovのソリューション統合により、ディープフェイクやなりすまし攻撃に対する本人確認を保証 IDnow Trust Servicesの欧州設立 IDnowは、EU規制下の適格トラストサービスプロバイダー(QTSP)としてIDnow Trust Services ABを設立 電子署名やタイムスタンプなどの適格トラストサービスの提供を目指す合弁事業 IDnow Trust Services ABは電子証明書の発行・管理においてスウェーデンの監督機関から認定 SMS不要の署名を可能にし、ユーザーの利便性と成約率を向上 インドにおける本人確認の革新 Veriffが合成IDとオンライン詐欺対策のためFaceBlockを導入 FaceBlockはAIを使用して複数のプロファイルにまたがる顔を識別し、偽アカウント作成を防止 インドのID規制に準拠するため、KYCとAML対策を支援するAadhaar認証がオプション機能として追加 ジャック・ドーシー率いるBlockが、Web5を推進する参加企業tbdの縮小を発表

ジャック・ドーシーの会社であるBlockは、ビットコインマイニング機器とセルフカストディの暗号ウォレットに注力する方針を示しています。Web5の開発を縮小し、Tidalへの投資を削減しています。これは政治的な状況、特にドナルド・トランプの選挙勝利と関連しており、暗号通貨に好環境が生まれると予測されています。第三四半期の収益は59.8億ドルで、ウォール街の予想を下回り、成長が収益を上回ったため、2024年末までに最大10%の人員削減を計画しています。Blockはビットコイン自体を採掘していませんが、マイニング機器を販売し、独自のマイニングコンピュータの開発を進めており、3nmのマイニングチップを作成し、Core Scientificとのパートナーシップを結んでいます。また、Bitkeyというセルフカストディのウォレットを強化しており、これはCash AppやCoinbaseと連携してBTCの取引を可能にします。

各国のデジタルアイデンティティ構想 アイルランド:移民サービスのデジタル本人確認にID-Palのソリューションを採用し、オンライン更新を可能に ナミビア:UNDPと協力し、市民向け法的身分証明システムを確立 北マケドニア:デジタル身分証明書の保存と行政サービスへの接続が可能なELIアプリを開始 アメリカ:未請求の連邦支援へのアクセス改善のため、連邦デジタルIDを提案 南アフリカ:経済のデジタル化の一環として、固有のデジタルIDと即時ビザの計画を推進 マルタ:行政サービスへのアクセスのためのデジタルIDウォレットへの投資を予算に計上 デジタルアイデンティティのグローバルな展開 Googleが子供向けデジタルウォレットを開始し、監視下でのタップ決済取引を可能に SITAとIdemia Public Securityが空港の生体認証とデジタルIDのための分散型信頼ネットワークを構築 Digital Travel Ecosystemが旅行部門におけるデジタルIDの相互運用性と信頼性の向上を目指す

Simon Willison

Quoting Matt Webb

That development time acceleration of 4 days down to 20 minutes… that’s equivalent to about 10 years of Moore’s Law cycles. That is, using generative AI like this is equivalent to computers getting 10 years better overnight. That was a real eye-opening framing for me. AI isn’t magical, it’s not sentient, it’s not the end of the world nor our saviour; we don’t need to endlessly debate “intellige

That development time acceleration of 4 days down to 20 minutes… that’s equivalent to about 10 years of Moore’s Law cycles. That is, using generative AI like this is equivalent to computers getting 10 years better overnight.

That was a real eye-opening framing for me. AI isn’t magical, it’s not sentient, it’s not the end of the world nor our saviour; we don’t need to endlessly debate “intelligence” or “reasoning.” It’s just that… computers got 10 years better. The iPhone was first released in 2007. Imagine if it had come out in 1997 instead. We wouldn’t even know what to do with it.

Matt Webb

Tags: matt-webb, llms, ai, generative-ai

Sunday, 10. November 2024

IdM Laboratory

Token StatusListを深掘りしていく

こんにちは、富士榮です。 昨日投稿したIETFの振り返りを含め各ドラフトについて少し深掘りしていこうと思います。 昨日の振り返りポスト https://idmlab.eidentity.jp/2024/11/ietf121-dublin.html まずは、Token StatusListです。 こちらはドラフトドキュメントを軽く見ましたが、今回の現地でのミーティングを踏まえ少しアップデートしておきましょう。 https://idmlab.eidentity.jp/2024/10/ietftoken-status-list.html まぁ、基本的に以前書いた通りではあるのですが、Updateとして数点書いておきます。 履歴のリクエスト オプションではありますが、特定の時点で当該のトークンが有効だったのか?ということを確認したいという要件があるようです。その

こんにちは、富士榮です。


昨日投稿したIETFの振り返りを含め各ドラフトについて少し深掘りしていこうと思います。

昨日の振り返りポスト

https://idmlab.eidentity.jp/2024/11/ietf121-dublin.html


まずは、Token StatusListです。

こちらはドラフトドキュメントを軽く見ましたが、今回の現地でのミーティングを踏まえ少しアップデートしておきましょう。

https://idmlab.eidentity.jp/2024/10/ietftoken-status-list.html


まぁ、基本的に以前書いた通りではあるのですが、Updateとして数点書いておきます。

履歴のリクエスト オプションではありますが、特定の時点で当該のトークンが有効だったのか?ということを確認したいという要件があるようです。その要件に対応するためにクエリパラメータをサポートすることにしています。
これ、StatusList Providerの実装が面倒ですね・・・
Zlib以外の圧縮アルゴリズムをサポートするか 現状は以前も書いた通りBitString StatusListはZlibで圧縮されています。
このスライドにある通り、まずはサポートする圧縮方式のレポジトリを作ってとりあえずZlibだけ入れておこうか、という感じです。
レスポンスにContext-Typeをつけるかどうか 現状はレスポンスへのContent-Type付与が必須になっているわけですが、一部のCDNがサポートしていないということもありお悩みポイントになっているようです。
AsIsでいいんじゃないかと思いますが、WGLCまでには決まってくると思います。
とりあえずToken StatusListについてはこんなところです。




@_Nat Zone

デジタルアイデンティティ:世界の最新動向 –10月29日・11月5日合併版

10月29日、11月5日と、IIW, IETFの関連でお休みしていたので、まとめてお届けします。 アルゼンチン・ブエノスアイレス:世界初の政府主導ブロックチェーンIDを導入 ブエノスアイレス市が、市民360万人を対象とした画期的な取り組みを開始しました。政府主導のブロックチェーン…

10月29日、11月5日と、IIW, IETFの関連でお休みしていたので、まとめてお届けします。

アルゼンチン・ブエノスアイレス:世界初の政府主導ブロックチェーンIDを導入

ブエノスアイレス市が、市民360万人を対象とした画期的な取り組みを開始しました。政府主導のブロックチェーンベースのデジタルID「QuarkID」の導入です。

このシステムの特徴は以下の通りです:

ZKsyncが提供するEraレイヤー2ブロックチェーン上でゼロ知識証明を活用 市の公式デジタルプラットフォーム「MiBa」と連携し、各種行政サービスや文書へのアクセスが可能 市民が自身の個人情報をより安全にコントロールできる仕組み

MiBaユーザーは全員、分散型デジタルID(DID)を取得。出生証明書や納税記録といった重要書類を、セキュアに管理・共有できるようになりました。市当局は、これを個人情報管理における大きな転換点と位置付けています。

(参考)

https://www.biometricupdate.com/202410/buenos-aires-moves-from-centralized-to-decentralized-digital-identity-with-quarkid https://www.thinkdigitalpartners.com/news/2024/10/25/digital-identity-global-roundup-189/ イギリス:デジタル本人確認サービスの法整備へ

一方イギリスでは、2023年10月23日、上院にデータ(利用・アクセス)法案が提出されました。この法案には、デジタル本人確認サービスを法的に支える以下の重要な施策が含まれています:

経済活動全般における「適切な」デジタル本人確認サービスのルール作り 基準を満たすサービスの公的登録制度の創設 信頼できるサービスを簡単に識別できる認証マークの発行 公的機関のデータを本人確認や資格確認に活用できる新しい情報連携の基盤整備

また、この発表は新規に設立された政府部門 Office of Digital Identity and Attribute (OfDIA)として行われています。

(参考)https://www.gov.uk/government/news/new-data-laws-unveiled-to-improve-public-services-and-boost-uk-economy-by-10-billion

ポーランド:急速に普及するポーランドのデジタル身分証明書

ポーランド政府の発表によると、従来の物理的な身分証明書に代わるデジタルID「mDowód(エムドヴード)」の利用者が800万人を超えました。

このデジタルIDの特徴は以下の通りです:

銀行や公証人での本人確認など、従来の身分証明書と同様の用途で使用可能 デジタル担当副大臣のダリウシュ・スタンデルスキ氏が、従来の身分証明書と同等の法的効力を持つことを確認 独自の識別番号を付与

ただし、以下の場合には従来の紙またはプラスチック製の身分証明書が必要です:

新しい身分証明書の申請時 国境を越える際の本人確認

mDowódには、セキュリティ対策として以下の機能が搭載されています:

ポーランド国旗のアニメーション表示 リアルタイムの時計表示 インターネット接続なしでも利用可能 mObywatel(公式モバイルアプリ)内のQRコードによる情報認証

このように、ポーランドではデジタル化による行政サービスの利便性向上が着実に進んでいます。セキュリティと使いやすさを両立した同国の取り組みは、今後のデジタルID普及における重要な事例となりそうです。

(参考)https://www.thinkdigitalpartners.com/news/2024/11/04/digital-identity-global-roundup-190/

米国:分散型ID基盤のパイオニア「Sovrin」、2025年3月までに終了へ

分散型デジタルID(SSI)の先駆者であるSovrin Foundationが、2025年3月31日(もしくはそれ以前)にプラットフォームを終了する方針を発表しました。

Sovrinは

2017年の立ち上げ以来、分散型IDの基準作りをリード W3Cが採用した分散型識別子(DID)や検証可能な資格情報の概念を確立 政府や企業に依存しない、個人主体のID管理を実現 世界中の信頼できる組織が「管理者(steward)」として運営に参加

というように、分散型IDの世界に大きく貢献してきましたが、時の流れと市場の現実に抗えなかったようです。終了の主な理由として、

メインネットの利用減少 分散型IDに関する規制の不透明さ 技術的課題への対応による資源の枯渇 管理者コミュニティの参加低下

といったことを挙げています

7年間にわたり分散型IDの発展に貢献してきたSovrinの終了は、この分野における一つの時代の区切りを示すものといえそうです。

(参考)https://idtechwire.com/the-community-moved-on-sovrin-announces-mainnets-likely-shutdown/

イタリア:デジタルIDウォレットの導入を開始〜2025年の本格運用に向けて試験運用がスタート

イタリア政府は、国内初となるデジタルID制度のパイロットプログラムを開始しました。内務省が提供する公式アプリ「Io」を通じて、10月23日にまずは5万人の市民を対象に、運転免許証、健康保険証、欧州障害者カードのデジタル版が利用可能になります。ロールアウトのスケジュールは以下のようになっています。

10月23日- 50,000人 11月6日 – 250,000人 11月30日 – 1,000,000人 12月4日 – 全てのIO appユーザー

現段階では、これらのデジタル証明書はオンラインサービスやセルフサービスシステムでは使用できないものの、日常生活における本人確認手段として、従来の物理的な身分証明書と同様に使うことができます。

この取り組みは、2025年までに予定されている「ITウォレット」の本格導入に向けた重要なステップとして位置づけられています。イタリアはこの施策を通じて、EUのデジタルID基準(eIDAS規則)に準拠したシステムを構築し、行政サービスのデジタル化を推進していく方針です。

パイロットプログラムのロールアウトは以下のような形で進めていくことになっています。

パイロットプログラムの成果を踏まえ、イタリアは段階的にデジタルID機能の拡充を進め、より便利で安全な行政サービスの実現を目指しています。

(参考)https://www.thinkdigitalpartners.com/news/2024/11/04/digital-identity-global-roundup-190


IdM Laboratory

IETF121 Dublinオーバービュー

こんにちは、富士榮です。 先日のIIWからドイツ経由でダブリンにわたり、IETF 121 Dublinに参加してきました。 基本はWeb Authorization Protocolまわり(要するにOAuth)のセッションに参加しつつ、気になるSide Meetingに参加する、というスタイルで参加してきました。 なお、初めてだったこともあり、ほぼほぼ様子見となってしまったのであまりまとめてはいませんので、中身はOAuth WGのメーリングリストなどを参考にしてください。 OAuth WGのアジェンダはこちらです。 https://datatracker.ietf.org/doc/agenda-121-oauth/ テーマ盛りだくさんですね。 Token Status List Attestation-based Client Authentication
こんにちは、富士榮です。
先日のIIWからドイツ経由でダブリンにわたり、IETF 121 Dublinに参加してきました。 基本はWeb Authorization Protocolまわり(要するにOAuth)のセッションに参加しつつ、気になるSide Meetingに参加する、というスタイルで参加してきました。 なお、初めてだったこともあり、ほぼほぼ様子見となってしまったのであまりまとめてはいませんので、中身はOAuth WGのメーリングリストなどを参考にしてください。
OAuth WGのアジェンダはこちらです。 https://datatracker.ietf.org/doc/agenda-121-oauth/
テーマ盛りだくさんですね。 Token Status List Attestation-based Client Authentication Transaction Token Extending RFC8414 with client-specific responses OAuth Identity and Authorization Chaining Across Domains Identity Assertion Authorization Grant OAuth 2.1 update OAuth Client ID Metadata Document First Party Apps SD-JWT SD-JWT-VC One-time confirmation tokens One-time auth tokens
まぁ、IETFってそういうもんだ、という話でしたが結構生煮えな状態でドラフトを持ってきてみんなで叩く、って感じなんですね。新鮮でした。
印象に残ったものを2〜3点ばかり。 Token Status List 先日ここでも紹介したやつですね。 https://idmlab.eidentity.jp/2024/10/ietftoken-status-list.html
これは結構固まってきているイメージはありました。まぁ仕方がないとはいえBitStringを使うのは難しいですよね。ガベージコレクションなど工夫しないといつか枯渇するわけですが、じゃぁ、どうやってガベージコレクションするんだ?というあたりはまだこれから(というか運用マター)な感じでした。
OAuth Client ID Metadata Document SAMLでいうSP Metadataですよね。。。Dynamic Client Registrationと棲み分けるのか置き換えるのか、あたりがポイントになりそうです。 FastFedあたりに使っていくのかな、って感じです。
SD-JWT/SD-JWT-VC cty/typをどうするの?って話が面白かったですね。vc+sd-jwtからスタートしているわけですが、W3C VC DM 2.0ではvc+ld+jsonとかになってきているので、もう乱立状態です。 色々と大人の事情もありつつdc+sd-jwtにしようよ、って提案がありました。 うーん。正直ここまで来るとvcとかdcとか用途に近いことをprefixにつけるのは悪手に見えて仕方がありませんが・・・ まぁ、大事なところなので引き続き様子を見ていきます。

ということで次回はバンコクなので、引き続きオンラインで議論は続く感じです。ウォッチしていこうと思います。

しかし、ダブリンいいところですね。またいきたいです。
トリニティカレッジ
ギネスの生。もう缶には戻れません。


Saturday, 09. November 2024

Ben Werdmüller

PSA: Social media privacy and you

I’ve noticed a few mistaken assumptions circulating on social media lately, so I thought it was worth clarifying a few things around privacy and surveillance. Much of this depends on the idea of a threat model: a term that refers to the potential risks you face based on who you think might try to access your information and why. Making a social media profile private does not make it safe fro

I’ve noticed a few mistaken assumptions circulating on social media lately, so I thought it was worth clarifying a few things around privacy and surveillance.

Much of this depends on the idea of a threat model: a term that refers to the potential risks you face based on who you think might try to access your information and why.

Making a social media profile private does not make it safe from surveillance.

While it may make you safe from harassment by preventing drive-by comments from outside attackers, its content is still accessible by the platform owner.

For centralized services like Threads and X, this is hopefully obvious: the platform owner can see your content. However, it’s also true on other platforms. For example, the owner of your Mastodon instance could theoretically view your non-public posts.

If your main concern is harassment, setting your account to private can be a helpful step. If your threat model is a state actor or other large entity accessing your information and using it to incriminate you in some way, it does not prevent that from happening if the social media platform co-operates. For example, if X was compelled (or chose to) provide information about users posting about receiving reproductive healthcare, it could do that regardless of an account’s privacy settings. Threads or a Mastodon instance could similarly be subpoenaed for the same information.

Remember, even with privacy settings in place, your data belongs to the platform owner, not you. This is a critical point to understand in any digital space, regardless of ownership or whether it is centralized or decentralized. Even if a platform is decentralized, privacy still depends on who runs your instance, their stance on co-operating with outside requests for information, and the legal demands of the region they reside in.

If a platform chooses to co-operate, a warrant is not necessarily required for this information, and you may never find out that it has happened.

Decentralized/federated social networks are not free from surveillance.

These platforms are based on permissionless protocols, which allow anyone to join the network and interact without needing special permissions from anyone. This is great for accessibility but can also make it easier for bad actors to watch public posts.

In some ways, that makes them easier to surveil than centralized services. For an actor to surveil X or Threads, they would need to work with the platform owner. For an actor to do the same thing with Mastodon or Bluesky, they simply need to implement the protocol and go looking.

This is where making your account private can help, as long as the platform owner is not directly co-operating. (As described above, if a platform owner does co-operate, all data stored with them is potentially accessible.) If your account is public, your information can be freely indexed with no limitations.

Social media is not suitable for sensitive conversations.

As we’ve seen, privacy settings are helpful but limited. Given the limitations of privacy settings on social media, for truly sensitive conversations, it’s wise to switch to encrypted channels. You should also be mindful of what you share on any social platform, even with privacy settings enabled.

I always recommend Signal for sensitive conversations, and suggest using it to replace DMs entirely. You’re much more likely to use it for a sensitive conversation if you’re already using it for everyconversation. Unlike the alternatives, it’s open source and auditable, not owned by a large corporation, end-to-end encrypted, works on every platform, and is very easy to use.

You should also consider using Block Party, which is the most user-friendly tool I’ve seen for locking down your social media privacy settings.

In the end, privacy settings can only go so far. Using a platform like Signal can make a meaningful difference in safeguarding your most sensitive information. It’s a free, simple choice. But even more than that, it’s worth remembering: the point of social media is that someone is always watching. Act accordingly.


Journalists - is Twitter actually your community?

[Damon Kiesow in Working Systems] Damon Kiesow, who is the Knight Chair in Digital Editing and Producing at the Missouri School of Journalism, writes: "We ourselves have torn down the wall between editorial and business interests if as journalists, our calculation here is not values-based. To wit: “But I have a large following and neither BlueSky or Threads does.” That is th

[Damon Kiesow in Working Systems]

Damon Kiesow, who is the Knight Chair in Digital Editing and Producing at the Missouri School of Journalism, writes:

"We ourselves have torn down the wall between editorial and business interests if as journalists, our calculation here is not values-based. To wit: “But I have a large following and neither BlueSky or Threads does.” That is the rationalization of a marketer, not a journalist who believes in the SPJ Code of Ethics dictate to “minimize harm.”"

The questions Damon raises in this post are the right ones. It's long past time for journalists to interrogate their uses of social media and whether they're doing harm, and I deeply appreciate the callout to the SPJ Code of Ethics as a core principle here.

Alternatives are available that don't have these toxic traits and are more engaged, less dangerous for your community, and a part of the future of the web rather than a relic of the past. Use them.

#Media

[Link]


Accessible tips for people to protect their privacy

[Bill Fitzgerald] Bill Fitzgerald has updated his open source guide to personal privacy: "Conversations about privacy and security often focus on technology and give scant attention to the human, non-technological factors that affect personal privacy and security. This post covers a range of concrete steps we can all take to regain control over what, when, and with whom we s

[Bill Fitzgerald]

Bill Fitzgerald has updated his open source guide to personal privacy:

"Conversations about privacy and security often focus on technology and give scant attention to the human, non-technological factors that affect personal privacy and security. This post covers a range of concrete steps we can all take to regain control over what, when, and with whom we share."

I really appreciate the straightforwardness of the guide - these are things that everyone can do to help keep themselves safe. And because it's open source, the more eyes there are on it validating the information, the better the guide will get.

Some of the general advice is needfully pessimistic but doesn't always apply. For example, it talks about there not being an expectation of privacy on work devices, or using a work-provided VPN. That probably is generally true, but for example, in my role leading technology at ProPublica, I and others would absolutely flip a table if we decided to surveil our employees. (For one thing, that would be a terrible approach if we cared about keeping sources safe, which we obviously do.) So it's always worth checking in with your IT leadership to understand their concrete policy.

Regardless, I would feel comfortable sharing this verbatim. I'm grateful that Bill has released it under a Creative Commons Attribution Share-Alike license, so there are opportunities to create designs for this guide and share them back to the community.

#Technology

[Link]


IdM Laboratory

FAPI2.0のコンフォーマンステストがDPoPをサポートしました

こんにちは、富士榮です。 FAPI2.0のコンフォーマンステストがDPoPをサポートしたようです。 https://openid.net/fapi-2-0-conformance-tests-support-dpop/ 国内はFAPIの普及はまだまだですが、ブラジルやサウジアラビア、UAEなど先進的な取り組みをしている国も増えてきていますので、ぜひ金融に限らず採用してけるといいですね。 そういう意味でもIPSIEは楽しみかもしれません。
こんにちは、富士榮です。
FAPI2.0のコンフォーマンステストがDPoPをサポートしたようです。

https://openid.net/fapi-2-0-conformance-tests-support-dpop/

国内はFAPIの普及はまだまだですが、ブラジルやサウジアラビア、UAEなど先進的な取り組みをしている国も増えてきていますので、ぜひ金融に限らず採用してけるといいですね。 そういう意味でもIPSIEは楽しみかもしれません。

Friday, 08. November 2024

Ben Werdmüller

Wind the clock

[Molly White] Unsure what to do now? Molly White has some solid ways to get started helping: ""Many of us have looked back on historic events where people have bravely stood up against powerful adversaries and wondered, “what would I have done?” Now is your chance to find out. It did not just start with this election; it has been that time for a long time. If you’re just rea

[Molly White]

Unsure what to do now? Molly White has some solid ways to get started helping:

""Many of us have looked back on historic events where people have bravely stood up against powerful adversaries and wondered, “what would I have done?” Now is your chance to find out. It did not just start with this election; it has been that time for a long time. If you’re just realizing it now, get your ass in gear. Make yourself proud.""

There are compelling suggestions here around protecting yourself; working to support press freedom and access to information; migrant rights; reproductive rights; trans rights. But more than that, the spirit of this post is that we should have a bias towards meaningful action.

#Technology

[Link]


Heres Tom with the Weather

Webfinger in the Wild

Today, a post in my feed included a mention and its webfinger verification threw a WebFinger::BadRequest exception: Nov 08 09:18:49 AM WebFinger::BadRequest (Bad Request): Nov 08 09:18:49 AM Nov 08 09:18:49 AM app/models/account.rb:79:in `fetch_and_create_mastodon_account' Nov 08 09:18:49 AM app/models/account.rb:367:in `block in create_status!' Nov 08 09:18:49 AM app/models/account.rb:36

Today, a post in my feed included a mention and its webfinger verification threw a WebFinger::BadRequest exception:

Nov 08 09:18:49 AM WebFinger::BadRequest (Bad Request): Nov 08 09:18:49 AM Nov 08 09:18:49 AM app/models/account.rb:79:in `fetch_and_create_mastodon_account' Nov 08 09:18:49 AM app/models/account.rb:367:in `block in create_status!' Nov 08 09:18:49 AM app/models/account.rb:364:in `each' Nov 08 09:18:49 AM app/models/account.rb:364:in `create_status!' Nov 08 09:18:49 AM app/lib/activity_pub/activity/create.rb:20:in `perform' Nov 08 09:18:49 AM app/controllers/accounts_controller.rb:148:in `process_item' Nov 08 09:18:49 AM app/controllers/accounts_controller.rb:75:in `inbox'

The activitypub actor document resided on mastodon.well.com but when a reverse discovery was performed, the hostname of the subject in the webfinger response was well.com instead of mastodon.well.com. Making a webfinger request to well.com for the mentioned user returned a 500 Internal Server Error so a WebFinger::BadRequest exception was thrown. What was going on?

Fortunately, an issue in the activitypub-webfinger had the answer:

Looks like some are using this host-meta redirect to use a custom domain for actors which is different to the actual domain of the server.

And that is what was happening:

curl https://mastodon.well.com/.well-known/host-meta <?xml version="1.0" encoding="UTF-8"?> <XRD xmlns="http://docs.oasis-open.org/ns/xri/xrd-1.0"> <Link rel="lrdd" template="https://mastodon.well.com/.well-known/webfinger?resource={uri}"/> </XRD>

A response in the issue notes

The use of host-meta as a “second layer of indirection” is something that mostly a holdover from the OStatus days, IIRC. Most projects that aren’t Mastodon or Pleroma will not check host-meta at all, and will instead always skip straight to the /.well-known/webfinger endpoint. I don’t think it makes sense to unnecessarily pressure everyone into adopting host-meta or supporting variable LRDD endpoints

I can’t argue with that so I just handled the exception without setting the custom domain.


Ben Werdmüller

One Practical Thing: Protect Information

[Heather Bryant] Some good advice from Heather: "One practical thing you can do in as much depth as you like, identify a particular area of information that you care about or feel is important and protect it. Whether it's critical public data, old abandoned websites, or niche community content that you think is worth preserving, the information is worth saving." There is l

[Heather Bryant]

Some good advice from Heather:

"One practical thing you can do in as much depth as you like, identify a particular area of information that you care about or feel is important and protect it. Whether it's critical public data, old abandoned websites, or niche community content that you think is worth preserving, the information is worth saving."

There is lots of practical advice in her piece: contributing to ArchiveTeam and to the End of Term Web Archive, downloading a copy of Wikipedia, and simply keeping a copy of useful information. I agree - particularly in a world where we're all so dependent on storing things in the cloud. The longevity of all of that information matters.

#Technology

[Link]


Building the news

A question many of us are asking: how can we be as effective as possible over the next four years? A few years ago, I made the decision to move out of tech into non-profit news, and I'm glad I'm here. It's a different environment and the learning curve has sometimes been steep, but I strongly believe in the power of mission-driven investigative journalism and journalism centered in diverse pers

A question many of us are asking: how can we be as effective as possible over the next four years?

A few years ago, I made the decision to move out of tech into non-profit news, and I'm glad I'm here. It's a different environment and the learning curve has sometimes been steep, but I strongly believe in the power of mission-driven investigative journalism and journalism centered in diverse perspectives to strengthen democracy. And it sure feels like democracy could use some strengthening.

My career has been driven by building open source platforms that offer alternatives to centralized services. I believe news is in dire need of these alternatives. To reach audiences and make an impact, newsrooms are currently dependent on companies like X, Google, and Apple - and therefore subject to their changing business decisions. This particularly matters in a shifting business landscape and a new political order that may create a more adverse environment for journalism.

For technologists - engineers, product leads, designers - there is an opportunity to help build great platforms that serve both newsrooms and audiences, and therefore democracy.

For newsrooms, there is an opportunity to invest in new platforms that will give you more autonomy and help you build deeper relationships with your audience. (Hint: newsletters are great but don't go far enough - and what happens when everybody's inboxes are managed by AI? That future is coming.)

I've been trying to work on creating space to bring these groups together. More on that later. But I think this is the work: news needs to invest in platform, and platform builders need to work in news. I've often complained that journalism treats technology like something that just happens to it, rather than owning and building it; now, this lack of ownership and strategy is becoming an existential threat.

This post was originally published on LinkedIn.


Phil Windleys Technometria

Internet Identity Workshop XXXIX Report

The 39th edition of the Internet Identity Workshop wrapped up last week.

The 39th edition of the Internet Identity Workshop wrapped up last week. We have 364 attendees from around the world who called 178 sessions. I can't begin to describe the energy of the identity community when they all get together to discuss their favorite topics. If you like identity, or simply want to be part of an enthusiastic technical community, you should come to IIW.

As you can see by the pins in the map at the top of this post, there were attendees from all over the world. Not surprisingly, most of the attendees were from the US (251), followed by Canada (18) and France (14). Germany, Japan, and Australia rounded out the top six with 12, 9, and 7 attendees respectively. Attendees from India (5), Columbia (3), and Chile (2) show IIW's geographic diversity. Sadly, there were no attendees from Africa again. Please remember we offer scholarships for people from underrepresented areas, so if you'd like to come to IIW40, please let us know. If you're working on identity, we want you there.

For states and provinces, California was first with 131 attendees. Washington (19), Utah (14), New York (9), and Massachusetts (9) made up the rest of the top 5. San Jose (20), San Francisco (16), Paris (12), Oakland (11), and Seattle (9) were the top five cities.

We'll have the book of proceedings out in a month or so with notes from the different sessions and descriptions of the 20 demos given during demo hour. Past proceedings are available here.

The next IIW takes place April 8-10, 2025 at the Computer History Museum. This will be IIW XL, number 40! We'll have registration open the first part of December. If you're interested in sponsoring, send me a note.

Subscribe now

Share


IdM Laboratory

[監訳]デジタルアイデンティティのすべて

こんにちは、富士榮です。 ようやくオープンにできる日がきました。 「デジタルアイデンティティのすべて」  安全かつユーザー中心のアイデンティティシステムを実現するための知識 https://www.ohmsha.co.jp/book/9784814400980/ 12月27日に出版です。Amazonや楽天Booksなどで購入いただけます。 本書はPhil Windleyの近著「Learning Digital Identity」の翻訳版で、私は技術監修をしています。翻訳してくれた有志の皆さん本当にお疲れ様でした。(まだ残ってるけど) この本ですね。 https://amzn.asia/d/hHKHKnx 日本語版の表紙も出てきました!(楽天Booksにはもう掲載されています) ということでIIWでは
こんにちは、富士榮です。
ようやくオープンにできる日がきました。
「デジタルアイデンティティのすべて」  安全かつユーザー中心のアイデンティティシステムを実現するための知識
https://www.ohmsha.co.jp/book/9784814400980/
12月27日に出版です。Amazonや楽天Booksなどで購入いただけます。

本書はPhil Windleyの近著「Learning Digital Identity」の翻訳版で、私は技術監修をしています。翻訳してくれた有志の皆さん本当にお疲れ様でした。(まだ残ってるけど)
この本ですね。 https://amzn.asia/d/hHKHKnx

日本語版の表紙も出てきました!(楽天Booksにはもう掲載されています)


ということでIIWではPhilにコメントをもらったりもしているので、出版後のどこかのイベントで皆さんにも共有できると思います。

先月末のIIWでPhilから日本の皆さんにコメントをもらってきました。

では、みなさんお楽しみに。




Thursday, 07. November 2024

John Philpin : Lifestream

🔗📼🎵 Revolution … Did it happen? Is it happening? Will it hap

🔗📼🎵 Revolution … Did it happen? Is it happening? Will it happen?

🔗📼🎵 Revolution … Did it happen? Is it happening? Will it happen?


Ben Werdmüller

Why Democrats won't build their own Joe Rogan

[Taylor Lorenz at User Mag] Gen Z men have lurched to the right, which was one factor behind this month's election result. This is, in part, because they've been inundated with media that speaks to a right-leaning point of view - and there's almost no counterpart on the Left. "Leftist channels do not receive widespread financial backing from billionaires or large institution

[Taylor Lorenz at User Mag]

Gen Z men have lurched to the right, which was one factor behind this month's election result. This is, in part, because they've been inundated with media that speaks to a right-leaning point of view - and there's almost no counterpart on the Left.

"Leftist channels do not receive widespread financial backing from billionaires or large institutional donors, primarily because leftist content creators support policies that are completely at odds with what billionaires want."

There is no progressive answer to Rogan. There could be - there's no shortage of progressive hosts who could fill this role - but as Taylor argues here, and as I've also previously argued, the funding isn't there for it.

As Taylor points out:

"The conservative media landscape in the United States is exceptionally well-funded, meticulously constructed, and highly coordinated. Wealthy donors, PACs, and corporations with a vested interest in preserving or expanding conservative policies strategically invest in right-wing media channels and up and coming content creators."

For progressive causes to win, there must be investment in progressive influencers. Not in a cringe Air America way, but authentic voices who are already out there and need a lift to reach more audiences. So the question becomes: where are those progressive influencers? And who can bankroll them in such a way that they retain their independence and authenticity - but amplified?

#Media

[Link]


The Pragmatic Engineer

The Pulse #114: What does Trump’s win mean for Tech?

More influence for Musk and VCs, potentially reversing Section 174, pro-crypto, uncertainty for Big Tech, and more. Also: a third embarrassing security issue for Okta in two years

The Pulse is a series covering insights, patterns, and trends within Big Tech and startups. Notice an interesting event or trend? Send me a message.

This issue focuses on the US election result, and there’s no Industry Pulse for one week only. In this issue:

What does Trump’s win mean for the tech industry?

Yet another bizarre Okta security vulnerability

Let’s get into it!

1. What does Trump’s win mean for the tech industry?

On Tuesday, the US elected Donald Trump as president for a second time, defying pollsters who failed to predict his decisive victory, with the Republican party also taking control of the Senate. Based on what we know about Trump and the party he leads, we can look for some clues about the coming near-half decade of his administration’s impact on the US tech sector.

Elon Musk, a16z, and VCs bid for influence

One of the biggest Trump supporters – both in publicity and money terms – is Elon Musk. Other prominent venture capitalists who campaigned for Trump include a16z founders Marc Andreessen and Ben Horowitz, and venture capitalist, Peter Thiel. It is all but certain that these groups will have a say in shaping national tech policies. Expect these groups to lobby on issues such as:

Less antitrust for acquiring tech companies: VCs will most certainly be lobbying to allow Big Tech companies to acquire VC-funded startups without regulatory scrutiny, which is in the interest of venture capitalists by opening up more exit paths. The FTC’s blocking of Adobe’s $20B acquisition of Figma in 2023 led to large tech companies “poaching” founders and key employees from AI startups – like Google essentially acquihiring Character.ai, Microsoft basically acquihiring Inflection AI, and Amazon doing a “talent raid” with AI startup Covariant and Adept AI.

AI regulation: Elon Musk has a large stake in xAI, and could push to stop any AI regulation which doesn’t make it harder for newer entrants to enter. Other players like OpenAI and Anthropic will likely lobby for regulation that doesn’t impact their business.

Will Trump reverse his Section 174?

A Trump administration in full control of the Senate creates an opportunity to reverse Section 174; a piece of regulation that is making US tech companies less competitive, globally. In short, under this rule, software engineering salaries cannot be expensed as a cost as in other countries; they must be amortized over 5 years, or 15 years when employing software developers abroad. We cover more in a deep dive.

Read more


Ben Werdmüller

Social media is a symptom, not a cause

[Mathew Ingram at The Torment Nexus] Mathew Ingram on blaming social media for the stratification of society: "In the end, that is too simple an explanation, just as blaming the New York Times' coverage of the race is too simple, or accusing more than half of the American electorate of being too stupid to see Trump for what he really is. They saw it, and they voted for him a

[Mathew Ingram at The Torment Nexus]

Mathew Ingram on blaming social media for the stratification of society:

"In the end, that is too simple an explanation, just as blaming the New York Times' coverage of the race is too simple, or accusing more than half of the American electorate of being too stupid to see Trump for what he really is. They saw it, and they voted for him anyway. That's the reality."

This piece does a good job of debunking the lingering idea that "fake news" swings elections, or that social media bubbles are responsible for multiple realities and "alternative facts". In fact, this is a process that has been ongoing since the 1990s, and social media is a mirror of it rather than the cause.

If you're looking for answers, you need to look elsewhere.

#Media

[Link]


Werdmüller on Medium

A hypothesis about the Democrats

Have they been culturally outmaneuvered? Continue reading on Medium »

Have they been culturally outmaneuvered?

Continue reading on Medium »


IdM Laboratory

Identiverse 2025のプレゼンテーション募集が始まっています

こんにちは、富士榮です。 先日European Identity and Cloud Conferenceのスピーカー募集の件について触れましたが、今回は続いてIdentiverse 2025の募集です。 https://app.sessionboard.com/submit/new-conference/9329e74a-ca9b-4a19-8458-d2ea6ed5a752 本番は2025年6月3日〜6日、プレゼン締切は年明け1月3日です。 早めに申し込みましょう。
こんにちは、富士榮です。
先日European Identity and Cloud Conferenceのスピーカー募集の件について触れましたが、今回は続いてIdentiverse 2025の募集です。


https://app.sessionboard.com/submit/new-conference/9329e74a-ca9b-4a19-8458-d2ea6ed5a752
本番は2025年6月3日〜6日、プレゼン締切は年明け1月3日です。 早めに申し込みましょう。


John Philpin : Lifestream

🔗📼🎵 Do you care about your rights?

🔗📼🎵 Do you care about your rights?

🔗📼🎵 Twice In A Lifetime - Apparently

🔗📼🎵 Twice In A Lifetime - Apparently

🔗📼🎵Have you lost your religion? What is your ‘religion’ anyw

🔗📼🎵Have you lost your religion? What is your ‘religion’ anyway? It should probably just be taken away - let’s not leave it to chance.

🔗📼 You can still rock - but will it be in the free world

🔗📼 You can still rock - but will it be in the free world

Wednesday, 06. November 2024

John Philpin : Lifestream

🔗📼 Oh Well

🔗📼 Oh Well

🔗📼 During which time … A Hard Rain’s Gonna Fall

🔗📼 During which time … A Hard Rain’s Gonna Fall

🔗📼 Burning Down The House Your song of the day, week, mont

🔗📼 Burning Down The House Your song of the day, week, month, year, 4 years.

🔗📼 Burning Down The House

Your song of the day, week, month, year, 4 years.


The Pragmatic Engineer

Promotions and tooling at Google (with Irina Stanescu, Ex-Google)

In today’s episode of The Pragmatic Engineer, I’m joined by Irina Stanescu, a seasoned engineer with over 14 years in software engineering and engineering leadership roles at tech companies like Google and Uber.

Listen now on Apple, Spotify, and YouTube.

Brought to you by:

WorkOS — The modern identity platform for B2B SaaS.
Sonar —  Trust your developers – verify your AI-generated code.

In today’s episode of The Pragmatic Engineer, I’m joined by Irina Stanescu, a seasoned engineer with over 14 years in software engineering and engineering leadership roles at tech companies like Google and Uber. Now an engineering leadership coach, Irina helps tech professionals build impactful careers, teaches a course on influence, and shares insights through her newsletter, The Caring Techie. In our conversation today, Irina shares her journey of rising through the ranks at Google and Uber. We dive into the following topics: 

An inside look at Google’s unique working processes

How to build credibility as a new engineer

Tactical tips for getting promoted 

The importance of having a career plan and guidance in designing one

Having influence vs. influencing—and how to become more influential 

Essential leadership skills to develop

And so much more

Takeaways

My biggest takeaways from this conversation:

1. Google is different in its tooling and processes than most tech companies. While it’s common enough to hear from current and former Googlers just how many custom internal systems Google has, Irina shared several of these, which perhaps made it more specific. She mentioned ones like Borg (deployments), Critique (code reviews) and Memgen (meme generator). There’s often no direct mapping of these tools at other companies.

Google also has processes around code quality that few, if any companies put in place. The concept of a “readability review” code reviewer is one of these. It shows just how much Google values clean and easy-to-understand code, to the point that they created a pretty heavyweight process to ensure code readability remains a top priority.

2. Being rejected for a promotion doesn’t necessarily mean that you’ll be slow to be promoted later. It sounded to me that Irina was very disappointed when her first promotion application from the L3 to the L4 level got rejected. At the same time, after she got promoted to L4, it took her a mere one year to get to the L5 level – which is an unusually short time at a place like Google! What happened?

As Irina shared: after her first promotion rejection, she was more active in looking out for teams where she could have a bigger impact. When she joined a newly created team and there was a ton of work to do, she stepped up and did all this work.

My read is that thanks to staying longer at Google, she understood more about how the company works; her tenure surely helped her get promoted – as well as doing great work. It’s a reminder that it can be worth grinding teeth and pushing through situations that don’t go your way – and turn things around later!

3. If you want to influence others: start by building credibility! Think about peers who you consider “influential:” why do you listen to what they say? There is a fair chance that those tech professionals have a track record of getting things done, their work is visible to you, and they are active participants in both discussing things and getting things done. So, if you would like your peers – or people on other teams – to see you as influential, you also need to put in the groundwork!

Irina suggests asking yourself the question: “Why should people listen to you?” And figure out the answers to these questions as well:

What do people around you know about you?

How do you present your ideas?

How frequently do you listen to people who you would like to influence?

Influence is a two-way street: some of the most influential tech professionals not only get things done, but they are good at listening to others and changing their minds and approaches if and when the other person has good points.

Timestamps

(00:00) Intro

(01:34) Irina’s time at Google

(03:10) An overview of ‘design docs’ at Google

(08:27) The readiness review at Google

(10:40) Why Irina uses spreadsheets

(11:44) Irina’s favorite tools and how she uses them

(13:46) How Google certifies readability

(15:40) Google’s meme generator 

(17:36) Advice for engineers thinking about working for an organization like Google

(20:14) How promotions work at Google

(23:15) How Irina worked towards getting promoted 

(27:50) How Irina got her first mentor 

(30:44) Organizational shifts at Uber while Irina and Gergely were there

(35:50) Why you should prioritize growth over promotion

(36:50) What a career plan is and how to build one

(40:40) Irina’s current role coaching engineers 

(42:23) A simple explanation of influence and influencing 

(51:54) Why saying no is necessary at times

(54:30) The importance of building leadership skills

References

The Pragmatic Engineer deepdives relevant for this episode:

Preparing for promotions ahead of time 

Engineering career paths at Big Tech and scaleups

Getting an Engineering Executive Job

The Seniority Rollercoaster

Where to find Irina Stanescu:

• X: https://x.com/thecaringtechie

• LinkedIn: https://www.linkedin.com/in/irinastanescu/

• Website:

The Caring Techie Newsletter Most people know the potential they carry, but few reach it. This newsletter teaches you how. Lessons from my experience at Google, Uber, and startups. By Irina Stanescu

• Maven course: Impact through Influence in Engineering Teams: https://maven.com/irina-stanescu/influence-swe

Mentions during the episode:

• Google Fiber: https://fiber.google.com/

• Design docs at Google: https://news.ycombinator.com/item?id=40273534

• Code Search: https://developers.google.com/code-search

• Inside Google's Internal Meme Generator: https://www.buzzfeednews.com/article/reyhan/inside-googles-internal-meme-generator

• Robert Waldinger: What makes a good life? Lessons from the longest study on happiness | TED:

• Think Again: The Power of Knowing What You Don't Know: https://www.amazon.com/Think-Again-Power-Knowing-What/dp/1984878123/

• The Midnight Library: https://www.amazon.com/Midnight-Library-Novel-Matt-Haig/dp/0525559493/

• Golang: https://go.dev/

• gRPC: https://grpc.io/

• Software Developer Promotions: Advice to Get to That Next Level: https://blog.pragmaticengineer.com/software-engineering-promotions/

• Preparing for Promotions Ahead of Time: https://newsletter.pragmaticengineer.com/p/preparing-for-promotions

• Getting an Engineering Executive Job: https://newsletter.pragmaticengineer.com/p/getting-an-engineering-executive

• The Seniority Rollercoaster: https://newsletter.pragmaticengineer.com/p/the-seniority-rollercoaster

Production and marketing by https://penname.co/. For inquiries about sponsoring the podcast, email podcast@pragmaticengineer.com.


Werdmüller on Medium

We can still rise

The election was bleak. But we have a future. Continue reading on Medium »

The election was bleak. But we have a future.

Continue reading on Medium »


Heres Tom with the Weather

Indieweb Meetup Austin today

After some time off IndieWeb Meetup Austin is back at Mozart’s Coffee tonight at 7pm. I guess a few things have transpired since the last meeting so I’m looking forward to learning about what’s going on again.

After some time off IndieWeb Meetup Austin is back at Mozart’s Coffee tonight at 7pm. I guess a few things have transpired since the last meeting so I’m looking forward to learning about what’s going on again.


IdM Laboratory

AuthZEN WG Authorization API 1.0 Implementer's draftの投票期間がもうすぐ始まります

こんにちは、富士榮です。 9月にPublic Review期間に入ったAuthZEN WGのAuthorization API 1.0のImplementer's draftですが来週11/7〜14で投票期間に入ります。 https://openid.net/notice-of-vote-implementers-draft-authzen-authorization-api-1-0/ 認可の標準化は人類の長年の夢だったのでこの機会に仕様を学んでフィードバックをぜひしていきましょう。 APIの概要はこちらに書いています。

こんにちは、富士榮です。

9月にPublic Review期間に入ったAuthZEN WGのAuthorization API 1.0のImplementer's draftですが来週11/7〜14で投票期間に入ります。


https://openid.net/notice-of-vote-implementers-draft-authzen-authorization-api-1-0/


認可の標準化は人類の長年の夢だったのでこの機会に仕様を学んでフィードバックをぜひしていきましょう。

APIの概要はこちらに書いています。





Doc Searls Weblog

Now What?

It used to be When. But that was yesterday: election day in the U.S. In California, where I voted (by mail), it’s still 10:30 PM., and the Blue folk are especially blue, because the whole thing is over. Trump hasn’t won yet, but he will. I correctly predicted a Trump win in 2016, a loss […]

It used to be When.

But that was yesterday: election day in the U.S.

In California, where I voted (by mail), it’s still 10:30 PM., and the Blue folk are especially blue, because the whole thing is over. Trump hasn’t won yet, but he will.

I correctly predicted a Trump win in 2016, a loss in 2020, and a win again in 2024. That was, until Biden dropped out. Harris was a much stronger candidate than I—and nearly everyone—expected.

Here in Paris, I went to bed last night expecting a Harris win. When I got up at 4 AM, it was clear that she would lose. Badly. And now, at 7:45 AM, it looks like the GOP will take the Senate and perhaps also the House

I have always avoided politics on this blog, and even more on social media. But I have always had a lot to say. It is not, however, what others are saying, so don’t guess at it.

What I will say for now is that Democrats need a new story. Or a collection of them.

I’ve talked about stories before. This TEDx talk is one place. This blog post is another.

I’ll be expanding on points made in both. Meanwhile, coffee and croissants await.

 

 


John Philpin : Lifestream

💬

💬

💬

Tuesday, 05. November 2024

The Pragmatic Engineer

How to become a more effective engineer

The importance of soft skills, implicit hierarchies, getting to “small wins”, understanding promotion processes and more. A guest post from software engineer Cindy Sridharan.

Hi – this is Gergely with the monthly, free issue of the Pragmatic Engineer. In every issue, I cover challenges at Big Tech and startups through the lens of engineering managers and senior engineers. To get weekly emails like this in your inbox, subscribe here:

Subscribe now

This article is a guest post. Interested in potentially writing one in The Pragmatic Engineer? Details on expressing interest.

Today happens to be election day in the US: the biggest political event in four years. While we will not discuss that kind of politics in this publication: this event is a good excuse to discuss the other type of politics: workplace politics. Specifically: for software engineers and engineering leaders.

Cindy Sridharan is a software engineer working in the Bay Area. I originally connected with Cindy years back, online, over distributed systems discussions, and we met in-person last year in San Francisco. As the topic of internal politics for software engineers came up, Cindy, frustrated with the kind of careless, non-productive discourse that swirled around this topic, wrote an article about it, back in 2022.

The article really resonated with me – and with other people I shared it with. So with the permission and help of Cindy, this is an edited and updated version of Cindy’s original article.

In this issue, Cindy covers:

Know how your org works

Soft skills: these are hard skills!

Implicit hierarchies

Cultures: top-down, bottom-up, and both at the same time

Get comfortable with the “mess”

Look for small wins

Understand organizational constraints

As related reading, see these The Pragmatic Engineer Deepdives:

Internal politics for software engineers and managers: Part 1

Internal politics for software engineers and managers: Part 2 

Engineering career paths at Big Tech and scaleups

Seven tactics to become a more effective software engineer. None are about coding

With this, it’s over to Cindy:

Some time ago, exhausted by never-ending complaints about self-defeating reward structures at companies, I made what seemed to me a fairly self-evident comment:

Cindy Sridharan on X

Some of the responses this comment garnered were, well, rather pointed. Most people’s dismay seemed to have stemmed from what they’d perceived to be my dismissiveness towards their well-intentioned albeit ultimately not very fruitful efforts to make things better at their workplace. 

I’ve been meaning to expand on some of my thoughts on this topic for months, since I feel this warrants a more nuanced and considered discussion than is feasible on social media.

This post aims to lay out some problems engineers might often encounter when trying to address causes of dysfunction at their companies. It offers some food for thought on how to be more effective working within the limitations and constraints of organizations.

One caveat I need to mention is that most of what I describe here is from the perspective of an individual contributor (IC). I’ve never been a manager and have no experience of navigating organizational politics as a manager. There are innumerable resources by seasoned managers on how to maneuver managerial politics, for those interested.

Preface: The distant mirage of aspirational ideas

It’s something of a rule of thumb that on social media, topics that generally require careful consideration are painted with reductionist, impractical, or aspirational brushstrokes. This is often done by people with very high levels of visibility, and sometimes by people who really ought to know better. Much of this oversimplified and irresponsible discourse gets excessively amplified, to the degree that it can very quickly become what’s perceived as “conventional wisdom”. None of this is productive. Worse, it gives easily influenced people the wrong idea of how organizations “must” function.

It can be quite discouraging to see aspirational goals get elevated to such heights that anything that falls short of their high standards is often deemed as “toxic” or “dysfunctional.”

Technical debt is a common talking point, so let’s take this as a concrete example. The accumulation of technical debt as teams prioritize building new features at a rapid pace, even if it comes at the expense of quality, performance, testing and so forth: this is a very common occurrence. As an industry, we’ve not built the tools, framework, or even an effective vocabulary required to talk about these tradeoffs, beyond simply calling it “technical debt”. As a result, most conversations around technical debt end up being oddly confusing. People are often  disappointed about how “leadership doesn’t get tech debt” or about how features are always prioritized over critical maintenance work.

Yes, ideally we should have a culture which prioritizes minimizing technical debt and building software sustainably, not just shipping features. But you’d be hard-pressed to find a single team or organization that prioritizes addressing technical debt as the primary focus of the team for a longer period of time. If and when technical debt does get prioritized as the primary focus of the team, it’s often because the technical debt has a noticeable and negative impact on a key, well-tracked, highly visible metric that reflects poorly on the team.

If your team is hitting all deliverables on time, then there might be an appetite for addressing the issue of technical debt in fits and starts. But in the vast majority of cases, addressing technical debt needs to be undertaken iteratively. You need to initially aim for small and easy wins that inspire confidence and lay the groundwork for you to push for bigger and better improvements. And you need to do all of this without slowing down your team’s delivery pace. Preferably without having protracted conversations with “leadership” to get necessary buy-in to do so.

Social media, blog posts and conferences amplify aspirational ideas (if leadership just “gets” why technical debt is so harmful and “prioritizes” it, then we can easily address this problem). Your organization, however, rewards what you actually get done which benefits the organization. This might be a very far cry from whatever might be de rigueur on social media.

1. Know how your org works

One of the most effective things you can do to be successful at your job is to understand how your organization works. This understanding will better inform your outlook on all things, including:

exactly what technical skill you need to invest effort into getting better at, which will actually be rewarded

how to build lasting relationships with other people on your team or organization that ultimately dictate the success of a project

how to effectively pitch projects or improvements to leadership and actually see these through to completion

how to navigate ambiguity

how to manage conflicting priorities or expectations

how to best deal with setbacks

how to weigh the pros and cons of technical choices in the larger context of the organizational realities and needs

how to identify and drive quick wins

how to discern what’s achievable, and in precisely what time frame

how to use this knowledge to judiciously pick battles

and in the worst case, to know when to cut your losses and quit

Managers need to deal with these skills as a part of their job description and so do ICs at the very senior levels. But it’s never too early in your career to start cultivating this knowledge. In fact, a core part of mentoring engineers involves educating them in how the organization works, to enable them to build a successful track record of getting things done.

Some managers and senior ICs often take a short-sighted view and see “shielding” non-senior folks from organizational politics as a way to help other engineers “maintain focus.”

Shielding non-senior engineers from organizational politics not just stymies their growth, but also hinders their visibility of the skills they’ll eventually need to learn the hard way. These are the kind of skills for which there exists no easy playbook.

2. Soft skills: these are hard skills!

This post doesn’t aim to be a comprehensive guide on how to learn the skills which helps one truly understand how an organization works, or even a comprehensive list of the skills themselves. Some of the points mentioned in this article that help one better understand how an organization works are simply ones I’ve encountered. If you ask someone else in a different organization, you might get a very different list. It’s no exploit to learn a new skill when you know exactly what to learn, how to learn it, and so long as the answer is straightforward, as is the case with many purely technical concepts.

Learning “how your organization works” is a constant exercise in learning the organization’s ever-changing landscape, especially as people, projects, priorities, partners, and leadership change. Learning how to make decisions when key pieces of information are missing is also a very important skill, insomuch as it helps you hone another set of valuable skills:

how best to gather information you’re missing

how and when to get by without doing so

Some of these skills I’m talking about can be learned by talking to people and some need to be inferred through close observation of leadership’s decisions. There are some skills, however, that can only be learned the hard way by getting things wrong, or watching other people get things wrong. 

In organizations with a culture of constant learning, visibility into failures isn’t something that’s discouraged. At the same time, whether your organization is one such which subscribes to the school of thought of making failures visible: this is something you’d only learn if you know how your organization works. 

The most important skill for any engineer to possess is the ability to learn quickly. This applies to both technical concepts and sociotechnical concepts. I’m absolutely by no means an expert in any of these myself; but over the years, I like to think I’ve got a better understanding of why this knowledge is important.

3. Implicit hierarchies

Most organizations have a formal structure. They usually start with a VP or a Director at the top, and proceed down to individual teams. If you’re an IC, you’re a leaf node in the org tree.

Most organizations, in my experience, also tend to have something of an informal structure, especially among ICs. In organizations that make job titles and levels public, it’s relatively easy to know which engineer might have more influence. In organizations where this is concealed, it’s a lot harder to infer the informal hierarchy, and where exactly you fit into it. Sometimes, it’s not so much to do with job titles and levels, than with tenure on the team or the organization. And sometimes, it’s some other factor, like subject matter expertise, open-source experience, or even something as arbitrary as employment history.

It’s important to be aware of this informal hierarchy because as often as not, it may directly influence your work, irrespective of your personal level and job title.

Engineers who wield an outsized influence on the decision making process tend to often be fairly senior, and also fairly opinionated. It usually isn’t even any particular opinion they might have on any topic that drives their decision making: but it’s usually overarching philosophies which guide their thinking. 

These opinions could shape everything from:

the way your codebase is structured

to the tooling in use

to the way the team tests or deploys a system

to the way the system is architected

to the reason why the team did or didn’t choose a specific technology to work with, or a specific team to partner with

to the reason why some things that seem “broken” are never prioritized

and more.

These philosophies and the opinions guided by them can end up being the decisive factor in whether your efforts to make any change or improvements to the existing system will be fruitful or not. Unless you understand “why” things are the way they are – for there often is a method to every madness, if you’re patient to dig deep enough – your proposal on “how” to improve the situation may end up going against the grain, making it that much more of an uphill task for your proposal to be accepted. 

Furthermore, your well-intentioned proposal to fix something that appears obviously “broken” or “neglected:” doing so  runs the risk of making you seem like someone who did not put in effort to understand the history of the system. Being perceived as someone who did not do their homework doesn’t exactly breed confidence in why you should be entrusted with fixing the system! 

One of Amazon’s Principle Engineering Tenets is “Respect What Came Before”. Many systems that appear to be “broken” are worthy of respect, and efforts to evolve them must be tackled from multiple angles:

Understand the implicit organizational hierarchy

Identify the people who wield unusually high influence; understand their way of thinking and general philosophies. Do this by either talking to them or other people in the organization, by researching their work, reading any articles or blog posts they wrote, or talks they presented, etc.

Identify how their philosophies were successfully applied to projects and teams they worked on. Why were these efforts considered successful? What were the problems that were solved by these philosophies? What problems were made worse?

How do you build credibility with highly influential people within the organization? Can you lean on your past work? Your subject matter expertise? Your previous track record? Is there someone they trust and respect who can vouch for you, for them to take a leap of faith and agree to do things your way?

These are all things to consider before making proposals to change a system. Smaller changes might not require this level of rigor, and might in fact be a good way to net a lot of easy wins. But for anything more involved and more high impact, learning how and why your organization makes technical decisions is a non-negotiable requirement.

4. Cultures: top-down, bottom-up, and both at the same time

Irrespective of titles and hierarchies, most organizations also have a top-down or bottom-up culture, or a mix of both. In absolute terms, neither one is superior compared to the other. Microsoft is a top-down organization. Meta has a bottom-up culture. Both are extremely successful companies.

In top-down cultures, the most important decisions are made from above. The person making the final decision could be a tech lead, sometimes a manager, or a Director-level executive. On such teams, much of your success boils down to “managing up”. Successfully managing up requires grappling with questions about the decision maker, such as:

Are you on the same wavelength as them? Do you both attach the same salience to the problem at hand? If not, are you up to the task of impressing upon them its importance and urgency?

Is there some information or knowledge they have and you don’t, that informs their thinking on the matter? How best can you get this information?

Do you both share the same view of the opportunity cost?

What are their implicit and explicit biases? What are their blind spots? Can you use some of these to your advantage?

What are the things they generally value? What kind of work or behavior impresses them?

Is there any specific abstraction or process or methodology they are particularly attached to? Can you lean in on these to more effectively market your opinion to them?

What’s the timeline they are comfortable working with to solve the problem? A month? A performance cycle? Many years?

What’s your personal level of trust with them? Will they go to bat for you?

What does “success” mean to them and how do they measure it? How have they typically measured it for in-progress work?

How do they typically handle setbacks? Have you drawn up contingency plans and shared them?

How do they handle failure? Do they assume responsibility for it, or will you be scapegoated – and possibly fired?

Do they have a culture of blameless postmortems for large-scale team or organizational failures? Are these lessons shared and discussed transparently with everyone on the team and in the organization?

What is their experience of working with partner teams or organizations?

Have they been burned badly in the past when working with another organization or another team?

What’s their organizational reputation? Are they well-liked? Respected?

How conflict-averse or otherwise are they?

Knowing the answer to these questions can give you a sense of how best to identify problems and propose solutions, to see them through, and demonstrate a level of impact that might advance your career.

On bottom-up teams, the challenge is to manage laterally while also managing-up. This includes grappling with conundrums like:

How do you build consensus among your peers when there’s no top-down decision-making authority?

How do you break down barriers between peers?

How do conflicts get resolved if there’s no higher authority to mediate? Does it boil down to nitty-gritty quantitative details like metrics, or something more nebulous such as “likeability”?

If all key ideas have to originate from the bottom, which ones make it to the top? How has this worked in the past?

Can coding solve all issues? Can you prototype an idea you have and then successfully pitch it? Does your team or organization empower you to do this during business hours, or are you willing to spend your nights and weekends pursuing this goal?

Did someone already attempt to solve the problem you’re trying to fix? How did that go? What were the failures? Do you understand the proximate cause of any failures? Are you sure you won’t run into the same issues again?

What’s the opportunity cost? Can you convince your peers it’s worth solving right away if it hasn’t been prioritized to date?

What’s your scope of influence? Does it extend to your team, your sister teams, or your entire org? Are people outside your team willing to give your solution a whirl?

How do you convince people or teams with different incentives? Is this something you can even do without top-down support?

How do you ensure adoption, especially cross-organizational adoption?

How do you enlist partners or advocates for your effort? Are there other teams ready to adopt your solution, were you to just build it and advocate for it?

Do you have key relationships with the stakeholders? Do they trust you? If not, why not? And how would you go about building this trust?

How do you convince peers with bad experiences of your team or project in the past?

How do you build credibility?

How do you motivate and incentivize your peers in general?

What’s the cost of failure? Just one fair to middling performance cycle, or something worse? Who’ll be impacted; Just you, or your entire team?

What are the cultural problems? In a bottom-up setting where there’s no higher authority to mandate teams to change how they work, how do culture problems get fixed?

There are many organizations that are top-down in some respects and bottom-up in others. On such teams, you’d need to employ a mix of strategies to successfully thread the needle for many of these issues and chaperone your ideas through to successful execution. 

5. Get comfortable with the “mess” Source: Cindy Sridharan on X

Most organizations value and reward people who “get things done”.

You’re far likelier to encounter codebases that have “evolved” over time, with poor documentation, lots of outdated comments and often with few to no tests, than you are to encounter ones which are perfectly documented, have well-tested public and internal APIs, and code which is perfectly obvious.

You’re going to be far more productive if you learn how to navigate such codebases successfully, which involves learning some of the following:

how to gather just the right amount of information to get on with your task

how not to get too caught up in the weeds, unless required

how to read a lot of code at a fast clip and come away with a reasonably good mental model of what it’s trying to do

how to come up with a hypothesis and to use a variety of general purpose techniques and tools to validate it

how to reproduce bugs quickly without elaborate local configurations and setups

These skills aren’t typically taught in college. They’re seldom talked about on social media or even at conferences. It plays well to the gallery to harp on about the importance of tests or documentation. I’m not trying to minimize their importance. But dealing with mess and ambiguity is a key skill to hone to improve your own productivity when working with code.

The same philosophy applies to working with sociotechnical systems like organizations: get comfortable with mess. You’re far likelier to encounter organizations comprising teams and leaders of:

varying levels of skill and ability to deliver on their promises

varying – sometimes opposing – incentives and reward structures

varying appetites for risk or change

varying philosophical views on software development and systems

varying levels of tolerance for failure

varying willingness to make investments in people and projects with a long-term view 

Being successful in “messy” organizations requires quickly learning the topology of the organization and charting pathways to navigate it. Your “personal ideal” may not match the reality on the ground. I’m cynical enough to believe everyone ultimately is looking out for their personal interest, and you need to look out for yours.

Get comfortable with mess and seek out ways to untangle it or work around it. Seek alignment when interests align. Be able to identify quickly when such alignment will always prove elusive. Be quick to dissociate amiably when interests clash irrevocably. Know when to batten down the hatches, but more importantly, also know when to cut your losses. Be transparent. 

Treat people with respect and humility, even when they disagree with you, or when you feel they are mistaken. Do this even when they seem to act against the best interests of the team or organization. It might very well be you who is failing to appreciate their predicament and you might be misunderstanding the reason for their actions. 

6. Look for small wins

It might take you way longer to truly get the measure of your organization’s sociotechnical politics, than to get up to speed with a codebase. 

To build credibility, you need to demonstrate some impact early on, instead of waiting months to get the lie of the land before you start getting anything done. Chasing small wins and low-hanging fruit can be an easy path to productivity. Don’t underestimate their importance.

7. Understand organizational constraints

Individual managers – much less ICs – can sometimes do only so much to solve the more entrenched organizational problems. DEI - Diversity, Equity and Inclusion - is one that quickly comes to mind. I’ve never seen this problem solved in a bottom-up manner successfully, anywhere. The vanishingly few organizations that did make modest progress often enjoyed executive buy-in. Organizations which were serious about DEI had executive compensation tied to the success of DEI efforts. 

Just how many organizations still remain committed to the principles of DEI in a post zero interest rates (ZIRP) world is unclear. I do expect this issue to become even more deprioritized in the current environment where companies are laser focused on profitability.

It’s folly for ICs or even managers to wade into fixing this - or any other issue -  solo, without explicit approval from their management chain, ideally with this work recognized in performance reviews. It’s one thing to truly feel passionate about a topic and to want to help create change; but please be realistic about expectations and outcomes. Charity Majors wrote a good post titled Know Your “One Job” And Do It First, and I largely agree with everything she says.

This is also applicable to a lot of other issues about “wholesale culture change.” Unless you’ve been hired with the explicit mandate to bring about a change in culture, i.e., at the executive level, you would be well-advised to be extremely wary of embarking on sweeping, ambitious projects or efforts.

That doesn’t mean you can’t create any change at all. The most effective instances of culture change I’ve seen have been incremental. It’s far easier to identify incremental wins when you’ve already learned the ropes by succeeding within the existing, flawed, cultural framework, than by starting from the ground up.

Another example is the promotion process, which is often perceived as a biased, opaque and arbitrary process at many companies. While the process might not work for certain ICs at a microlevel, the process is the way it is because it clearly works for the organization, based on whatever metrics the organization is tracking which you might not be privy to.

You can learn how the organization’s promotion process works and play your cards right. Or, if the process seems so arbitrary and unfair you feel you will never have a shot at succeeding, you can try to switch to organizations or companies where you feel you might have a fairer crack of the whip. 

Your manager might be able to elaborate on the whys and wherefores of this process, but managers have competing priorities to juggle and they cannot always guarantee their primary focus will be the career growth of all of their direct reports at all times. Which, again, is why you need to understand how your organization truly works, because you might then be able to seek out people other than your manager who might mentor you to better understand the organization’s way of doing things.

Conclusion

It’s easy to dismiss much of what’s in this post as “politics”. The unfortunate reality is that almost everything is political, and beyond a certain level, advancing further requires getting really good at playing this game.

Many engineers find it far easier to label things that don’t go their way as “politics”, as opposed to introspecting and learning the hard skills required to make better judgements. “Politics” doesn’t necessarily have to be a negative thing per se, and I suspect the near ubiquitous negative connotations attached to “politics” can be attributed to the fact that a lot of engineers aren’t the most astute when it comes to navigating these nuances. 

The truth is you can have a very gratifying and rewarding career as an engineer if you’re good at the “purely tech” side of things without ever worrying about the kind of problems described here.

But you’re far likelier to be one of those rare force multipliers if you’re also:

good at solving pressing problems

relentlessly getting things done

proactively creating iterative change

All of which requires understanding how your organization works.

This is Gergely, again.

Thank you to Cindy for this timely reminder on the importance of navigating your organization in order to become an effective engineer. You can follow Cindy on X, and read more of her writings on her blog.

The biggest takeaway from this article for me is this:

Software engineers frustrated at being “stuck” in their career often did no proper attempt to understand how their organization works. Answering question like:

How do people pitch ideas that leadership pays attention to?

What are activities at this workplace that tend to get rewarded?

Who are the people who are accessible to me and are “in the know” for different areas?

What is the implicit hierarchy at my workplace? Who are the most important engineers / product people that everyone seems to seek out informal advice from?

Is my workspace culture actually top-down, bottom-up, or both?

Tech companies are far more messy than any of us engineers would like to admit. I have talked with several software engineers who work at prestigious tech companies – and yet, they tell me that inside it is a surprisingly large mess. “Mess” meaning one or more of: lots of tech debt with no plan to pay it down, antiquated processes, political games, respected engineers being frustrated and on the verge of leaving. 

When I worked at Skype, and then Uber, I also experienced the same: from the outside everything looked idyllic. From the inside, it felt like some parts of the company were held together either by duct tape or scaffolding that was so fragile that it was a miracle it did not collapse on itself.

It’s good to have strong ideals about what “great” is: but understand the practicalities of “good enough.” The single most frustrated engineers I worked with were ones who refused to let go of their idealistic way of working: and were upset that their organization would refuse to do things the “right” way (in their mind, that is). There is a fine line between always pushing for more and better techologies/processes/approaches: but also understanding when it’s impractical to change the status quo. And – as Cindy reminded us – always start by understanding why technologies and processes have evolved to where they are at your current workplace.


IdM Laboratory

DID CommのFormal Verificationとセキュリティ強化

こんにちは、富士榮です。 クレデンシャル交換のプロトコルとして適しているのはDID CommなのかOpenID for Verifiable Credentials(OID4VC)なのか、という議論がひところ各所で聞かれましたが最近はそんな議論も落ち着いてきているのかな?と思っている今日この頃です(エコーチェンバー)。 当時、DID Commを否定的に捉える論拠の一つにFormal Verificationもすんでないじゃん、という話がありましたが、ようやく?やったみたいです。 What Did Come Out of It? Analysis and Improvements of DIDComm Messaging https://eprint.iacr.org/2024/1361 Abstractにはこんなことが書いてあります。 Sel
こんにちは、富士榮です。
クレデンシャル交換のプロトコルとして適しているのはDID CommなのかOpenID for Verifiable Credentials(OID4VC)なのか、という議論がひところ各所で聞かれましたが最近はそんな議論も落ち着いてきているのかな?と思っている今日この頃です(エコーチェンバー)。
当時、DID Commを否定的に捉える論拠の一つにFormal Verificationもすんでないじゃん、という話がありましたが、ようやく?やったみたいです。
What Did Come Out of It? Analysis and Improvements of DIDComm Messaging https://eprint.iacr.org/2024/1361


Abstractにはこんなことが書いてあります。
Self-Sovereign Identity (SSI) empowers individuals and organizations with full control over their data. Decentralized identifiers (DIDs) are at its center, where a DID contains a collection of public keys associated with an entity, and further information to enable entities to engage via secure and private messaging across different platforms. A crucial stepping stone is DIDComm, a cryptographic communication layer that is in production with version 2. Due to its widespread and active deployment, a formal study of DIDComm is highly overdue. We present the first formal analysis of DIDComm’s cryptography, and formalize its goal of (sender-) anonymity and authenticity. We follow a composable approach to capture its security over a generic network, formulating the goal of DIDComm as a strong ideal communication resource. We prove that the proposed encryption modes reach the expected level of privacy and authenticity, but leak beyond the leakage induced by an underlying network (captured by a parameterizable resource). We further use our formalism to propose enhancements and prove their security: first, we present an optimized algorithm that achieves simultaneously anonymity and authenticity, conforming to the DIDComm message format, and which outperforms the current DIDComm proposal in both ciphertext size and computation time by almost a factor of 2. Second, we present a novel DIDComm mode that fulfills the notion of anonymity preservation, in that it does never leak more than the leakage induced by the network it is executed over. We finally show how to merge this new mode into our improved algorithm, obtaining an efficient all-in-one mode for full anonymity and authenticity.

自己主権型アイデンティティ(SSI)は、個人や組織が自分のデータを完全にコントロールできるようにする。分散型識別子(DID)がその中心であり、DIDには、エンティティに関連する公開鍵のコレクションと、エンティティが異なるプラットフォーム間で安全かつプライベートなメッセージングを介して関与できるようにするためのさらなる情報が含まれている。その重要な足がかりとなるのがDIDCommであり、バージョン2で製品化された暗号通信レイヤーである。DIDCommは広く活発に展開されているため、DIDCommの正式な研究は非常に遅れている。

本稿では、DIDCommの暗号技術に関する初の形式的分析を行い、(送信者の)匿名性と真正性というDIDCommの目標を形式化する。DIDCommの目標を強力な理想的通信リソースとして定式化することで、一般的なネットワーク上での安全性を実現する。提案する暗号化モードは、期待されるプライバシーと真正性のレベルに達するが、(パラメータ化可能なリソースによって捕捉される)基礎となるネットワークによって誘発される漏洩を超えて漏洩することを証明する。

次に、匿名性と真正性を同時に達成し、DIDCommメッセージフォーマットに適合し、暗号文のサイズと計算時間の両方において、現在のDIDCommの提案をほぼ2倍上回る最適化アルゴリズムを提示する。最後に、この新しいモードを改良されたアルゴリズムにマージする方法を示し、完全な匿名性と真正性を実現する効率的なオールインワンモードを得る。


中身はまだ細かく見ていませんが、しっかりと分析を行うプロセスが実行された、ということなので良かったんじゃないかと思います。

いずれにしても特にIoTのユースケースなどDID Commの方が適しているものもある気がしますので、ちゃんと適切にプロトコルを選択していけると良いかと思います。


Monday, 04. November 2024

Just a Theory

Introducing RFC 9535 Go JSONPath and Playground

I’ve made a RFC 9535 JSONPath JSONPath Go package, and compiled it into Wasm to build an in-browser playground for it.

I’ve written and release a RFC 9535 JSONPath JSONPath Go package, github.com/theory/jsonpath. Why? For a personal project, I needed a simpler JSONPath engine to complement to the Go SQL/JSON Path package, and quickly found myself implementing most of the RFC. So I decided do the whole thing.

Yes, yet another JSONPath package in Go. I really appreciate the idea of a standard — plus its support for features not included in the original design from 2007, such as object slices! But I could find no reference to the RFC on pkg.go.dev. Today the search shows one!

Example

Usage is straightforward; here’s a quick example (Go playground):

package main import ( "fmt" "log" "github.com/theory/jsonpath" ) func main() { // Parse a jsonpath query. p, err := jsonpath.Parse(`$["name", "slogan"]`) if err != nil { log.Fatal(err) } // Select values from unmarshaled JSON input. json := map[string]any{ "name": "Kamala Harris", "title": "Vice President of the United States", "home": "California", "slogan": "We are not going back!", } items := p.Select(json) // Show the result. fmt.Printf("%#v\n", items) }

And the output:

[]interface {}{"Kamala Harris", "We are not going back!"} 🛝 Playground

No need to write code to try it out, though. I’ve also written a playground webapp to encourage experimentation and exploration of the syntax and behavior of the package. The implementation follows the precedents set by the Go SQL/JSON Playground and design of the Rust JSONPath Playground. Moreover, thanks to TinyGo, the Wasm file comes in at a mere 254K!

The webapp loads sample JSON from the RFC, and randomly rotates through a few example JSONPath queries. Fill in your own and tap the “Permalink” button to share links. The Playground is a stateless JavaScript/Wasm web application: data persists only in permalink URLs.1

🛝 Try this example Playground permalink right now!

Status

The root jsonpath package is stable and ready for use. Other packages remain in flux, as I refactor and rejigger things in the coming weeks as part of the aforementioned personal project. But for actual JSONPath execution and querying, it should continue to work as-is for the foreseeable futre.

I hope you find it useful.

And whatever data [GitHub Pages collect] 😔. ↩︎

More about… Go JSONPath Playground RFC 9535

NYTimes Tech Guild on Strike

My former colleagues in the New York Times Tech Guild have gone on strike in response to a number of unfair labor practice violations and management’s inability to bargain in good faith.

This is a big deal. My former colleagues in the New York Times Tech Guild have declared a strike today in response to a number of unfair labor practice violations and management’s inability to bargain in good faith. From the Washington Post:

Represented by the NewsGuild of New York, the Tech Guild has been negotiating its first contract with the company since 2022. Both sides have accused the other side of bogging down contract talks, which have recently focused on three key issues: “just cause” job protections (which ensure employees can’t be fired without reason and due process), remote work and pay equity. Unit members voted overwhelmingly in September to authorize the strike, saying the timing of the authorization — in the run-up to the high-profile election period — was “no accident.”

I urge you to join me in supporting the strike fund, which will help more vulnerable members to weather the strike.

Beyond that, the Guild asks supporters to refrain from interacting with Cooking and Games, and to break your streaks. Times reporters Maggie Astor has the details:

As of this morning, the New York Times Tech Guild, which represents NYT tech workers, is on strike.

Unless it’s resolved while today’s Wordle is still live, this streak is gone.

NYT Games and Cooking are BEHIND THE PICKET LINE. Please don’t play or engage with Games or Cooking content while the strike lasts!

News coverage is NOT behind the picket line. It’s okay to read and share that, though the site and app may have problems.

Read the whole thread for more information, which, as she says, “the rules may not be what you think!”

More about… New York Times Strike News Guild New York Times Tech Guild

Michael Ruminer

Agents Craft Hackathon and Inspiration Block

When you want to do a hackathon but am missing the perfect idea. Compliments of Flux.1 Schnell You likely know the saying, “Don’t let the perfect be the enemy of the good/done.” Sometimes, I have a hard time living that, and an existing effort is one example. I have registered for an AI agent hackathon, Agents Craft Hackathon. I don’t plan to be competitive but I doubt that is the goal o
When you want to do a hackathon but am missing the perfect idea. Compliments of Flux.1 Schnell

You likely know the saying, “Don’t let the perfect be the enemy of the good/done.” Sometimes, I have a hard time living that, and an existing effort is one example. I have registered for an AI agent hackathon, Agents Craft Hackathon. I don’t plan to be competitive but I doubt that is the goal of most participants. I do want a good idea that makes a great example and is a pseudo-novel idea that implements an agent usage not in the GenAI_Agents Github repo. Therein lies my problem; a unique and interesting use case and example. I’m inspiration-blocked. I know that I am letting the perfect be the enemy of the good or even of the enemy of the fun because there is a Discord channel with participant-created ideas for teams to work on. I want it simple yet inspired. I’m not inspired by any of the existing team projects; nothing against the projects as this is a me problem. I have a few more days to think about it.

Maybe you should join the hackathon. Your idea might be inspiring.


Aaron Parecki

My IETF 121 Agenda

Here's where you can find me at IETF 121 in Dublin!

Here's where you can find me at IETF 121 in Dublin!

Monday 9:30 - 11:30 • oauth 15:30 - 17:00 • alldispatch Tuesday 9:30 - 11:30 • oauth 13:00 - 14:30 • spice 16:30 - 17:30 • scim Thursday 9:30 - 11:30 • oauth Get in Touch LinkedIn GitHub Mastodon aaron@parecki.com My Current Drafts OAuth 2.1 OAuth for Browser-Based Apps OAuth Protected Resource Metadata OAuth for First-Party Apps OAuth Client ID Metadata Document Identity Assertion Authorization Grant OAuth Global Token Revocation

IdM Laboratory

今年もDigital Identity技術勉強会のAdvent Calendarの季節がやってきた

こんにちは、富士榮です。 毎年恒例のAdvent Calendarの季節になってきましたね。(はやい・・・) ということでDigital Identity技術勉強会(#iddance)のAdvent Calendarも作られていますので、この機会にぜひ皆さん参加しましょう。 https://qiita.com/advent-calendar/2024/iddance

こんにちは、富士榮です。

毎年恒例のAdvent Calendarの季節になってきましたね。(はやい・・・)


ということでDigital Identity技術勉強会(#iddance)のAdvent Calendarも作られていますので、この機会にぜひ皆さん参加しましょう。

https://qiita.com/advent-calendar/2024/iddance



Damien Bod

ASP.NET Core and Angular BFF using a YARP downstream API protected using certificate authentication

This article demonstrates how to implement a downstream API protected by certificate authentication using Microsoft YARP reverse proxy in an ASP.NET Core web application. The application uses Angular for its UI and secures both the UI and the ASP.NET Core backend through a backend-for-frontend security architecture. The downstream API is secured with certificate authentication and … … Continue read

This article demonstrates how to implement a downstream API protected by certificate authentication using Microsoft YARP reverse proxy in an ASP.NET Core web application. The application uses Angular for its UI and secures both the UI and the ASP.NET Core backend through a backend-for-frontend security architecture. The downstream API is secured with certificate authentication and deployed to an Azure App Service. The API requires a known client certificate for access.

Code https://github.com/damienbod/bff-aspnetcore-angular-downstream-cert-api

Setup

The application was built using the following github repository:

https://github.com/damienbod/bff-aspnetcore-angular

The web application is secured with an OpenID Connect confidential client using PKCE. The UI is built with Angular , while the backend is implemented using ASP.NET Core. The API is protected with certificate authentication. This application is deployed to an Azure App Service and requires the correct certificate for access. The server ensures that the correct certificate is used to access the API.

The YARP reverse proxy is setup in the start up of the ASP.NET Core application and the appsettings.json file. The client certificate required for the downstream API, is loaded into the project using the X509Certificate2 class. This could be loaded from the operating system, or Azure key vault or some other secure way. The demo loads this directly in an unsecure way.

The AddReverseProxy method adds the YARP definitions and the ConfigureHttpClient method is used to add the SslOptions containing the client certificate which is used for the Azure API calls. The MapReverseProxy is used to add the endpoints and reads the configuration from the app settings. The proxy is also used in development to load the Angular project. Depending on the UI packages, different setups are required.

var builder = WebApplication.CreateBuilder(args); builder.WebHost.ConfigureKestrel(serverOptions => { serverOptions.AddServerHeader = false; }); var services = builder.Services; var configuration = builder.Configuration; var stsServer = configuration["OpenIDConnectSettings:Authority"]; services.AddSecurityHeaderPolicies() .SetPolicySelector((PolicySelectorContext ctx) => { return SecurityHeadersDefinitions.GetHeaderPolicyCollection( builder.Environment.IsDevelopment(), stsServer); }); services.AddAntiforgery(options => { options.HeaderName = "X-XSRF-TOKEN"; options.Cookie.Name = "__Host-X-XSRF-TOKEN"; options.Cookie.SameSite = SameSiteMode.Strict; options.Cookie.SecurePolicy = CookieSecurePolicy.Always; }); services.AddHttpClient(); services.AddOptions(); services.AddAuthentication(options => { options.DefaultScheme = CookieAuthenticationDefaults.AuthenticationScheme; options.DefaultChallengeScheme = OpenIdConnectDefaults.AuthenticationScheme; }) .AddCookie() .AddOpenIdConnect(options => { configuration.GetSection("OpenIDConnectSettings").Bind(options); //options.Authority = configuration["OpenIDConnectSettings:Authority"]; //options.ClientId = configuration["OpenIDConnectSettings:ClientId"]; //options.ClientSecret = configuration["OpenIDConnectSettings:ClientSecret"]; options.SignInScheme = CookieAuthenticationDefaults.AuthenticationScheme; options.ResponseType = OpenIdConnectResponseType.Code; options.SaveTokens = true; options.GetClaimsFromUserInfoEndpoint = true; options.TokenValidationParameters = new TokenValidationParameters { NameClaimType = "name" }; }); services.AddControllersWithViews(options => options.Filters.Add(new AutoValidateAntiforgeryTokenAttribute())); services.AddRazorPages().AddMvcOptions(options => { //var policy = new AuthorizationPolicyBuilder() // .RequireAuthenticatedUser() // .Build(); //options.Filters.Add(new AuthorizeFilter(policy)); }); builder.Services.AddSingleton<ApiTokenCacheClient>(); // Create an authorization policy used by YARP when forwarding requests builder.Services.AddAuthorization(options => options.AddPolicy("CookieAuthenticationPolicy", builder => { builder.AddAuthenticationSchemes(CookieAuthenticationDefaults.AuthenticationScheme); builder.RequireAuthenticatedUser(); })); var cert = new X509Certificate2("client.pfx", "pfx-certificate-secret"); services.AddReverseProxy() .ConfigureHttpClient((context, handler) => { handler.SslOptions = new SslClientAuthenticationOptions { ClientCertificates = [cert] }; }) .LoadFromConfig(builder.Configuration.GetSection("ReverseProxy"));

The ReverseProxy settings adds the configuration which is almost standard like the documentation. The CookieAuthenticationPolicy is used to accept only authenticated requests. The YARP is setup differently for development and production.

"ReverseProxy": { "Routes": { "route1": { "ClusterId": "cluster1", "AuthorizationPolicy": "CookieAuthenticationPolicy", "Match": { "Path": "{**catch-all}" } } }, "Clusters": { "cluster1": { "HttpClient": { "SslProtocols": [ "Tls12" ] }, "Destinations": { "cluster1/destination1": { "Address": "https://azurecertauth20201108214641.azurewebsites.net/" } } } } }

Downstream API protected using certificate authentication

The downstream API uses certification authentication to protect the API. The API requires a certificate forwarding middleware in local development as well as allowing any certificate type including self signed certificates. When the application is deployed to an Azure App Service, the certificate forwarding middleware is not required. The API validates the thumbprint of the certificate and returns a 403 if the certificate authentication in ASP.NET Core fails. This is because the certificate is already validated before it reaches the application. It is really authorization.

using Microsoft.AspNetCore.Authentication.Certificate; using Microsoft.AspNetCore.Server.Kestrel.Https; using Microsoft.IdentityModel.JsonWebTokens; using Microsoft.IdentityModel.Logging; using Serilog; using System.Security.Claims; using System.Security.Cryptography.X509Certificates; namespace DownstreamApiCertAuth; internal static class StartupExtensions { public static WebApplication ConfigureServices(this WebApplicationBuilder builder) { var services = builder.Services; var configuration = builder.Configuration; if(builder.Environment.IsDevelopment()) { builder.WebHost.ConfigureKestrel((context, serverOptions) => { serverOptions.ConfigureHttpsDefaults(listenOptions => { listenOptions.ClientCertificateMode = ClientCertificateMode.RequireCertificate; listenOptions.AllowAnyClientCertificate(); }); }); } services.AddSingleton<MyCertificateValidationService>(); services.AddCertificateForwarding(options => { options.CertificateHeader = "X-ARR-ClientCert"; options.HeaderConverter = (headerValue) => { Console.WriteLine("headerValue: " + headerValue); X509Certificate2? clientCertificate = null; if (!string.IsNullOrWhiteSpace(headerValue)) { byte[] bytes = Convert.FromBase64String(headerValue); clientCertificate = new X509Certificate2(bytes); } return clientCertificate!; }; }); services.AddAuthentication(CertificateAuthenticationDefaults.AuthenticationScheme) .AddCertificate(options => // code from ASP.NET Core sample { // https://docs.microsoft.com/en-us/aspnet/core/security/authentication/certauth options.AllowedCertificateTypes = CertificateTypes.All; // Default values //options.AllowedCertificateTypes = CertificateTypes.Chained; //options.RevocationFlag = X509RevocationFlag.ExcludeRoot; options.RevocationMode = X509RevocationMode.NoCheck; options.ValidateCertificateUse = false; options.ValidateValidityPeriod = false; options.Events = new CertificateAuthenticationEvents { OnCertificateValidated = context => { var validationService = context.HttpContext.RequestServices.GetService<MyCertificateValidationService>(); if (validationService!.ValidateCertificate(context.ClientCertificate)) { var claims = new[] { new Claim(ClaimTypes.NameIdentifier, context.ClientCertificate.Subject, ClaimValueTypes.String, context.Options.ClaimsIssuer), new Claim(ClaimTypes.Name, context.ClientCertificate.Subject, ClaimValueTypes.String, context.Options.ClaimsIssuer) }; context.Principal = new ClaimsPrincipal(new ClaimsIdentity(claims, context.Scheme.Name)); context.Success(); } else { context.Fail("invalid cert"); } return Task.CompletedTask; }, OnAuthenticationFailed = new Func<CertificateAuthenticationFailedContext, Task>(context => { Console.WriteLine("OnAuthenticationFailed: " + context.Exception.Message); return Task.CompletedTask; }) }; }); services.AddAuthorization(); services.AddControllers(); return builder.Build(); } public static WebApplication ConfigurePipeline(this WebApplication app) { IdentityModelEventSource.ShowPII = true; JsonWebTokenHandler.DefaultInboundClaimTypeMap.Clear(); app.UseSerilogRequestLogging(); if (app.Environment.IsDevelopment()) { app.UseDeveloperExceptionPage(); } app.UseHttpsRedirection(); app.UseRouting(); if (app.Environment.IsDevelopment()) { app.UseCertificateForwarding(); } app.UseAuthentication(); app.UseAuthorization(); app.MapControllers(); return app; } }

When the application is started, the web application can use the downstream API protected using certificates.

Notes

This works good and the downstream APIs can be protected in a secure way using certificates. Using this approach, it is very easy to increase security or implement the app-to-app security in a good way. Of course security is only as good as the weakest link. The Angular UI view calling to the API protected with cookies is now the weakest link, depending of course on how you secure the downstream APIs, but this should not be a problem to do correctly.

Links

https://github.com/microsoft/reverse-proxy

https://github.com/damienbod/Blazor.BFF.OpenIDConnect.Template

https://docs.microsoft.com/en-us/aspnet/core/security/authentication/certauth

https://github.com/openiddict/openiddict-samples

https://github.com/damienbod/AspNetCoreCertificates

Sunday, 03. November 2024

Wrench in the Gears

Cheri Honkala, Philadelphia Housing Rights Activist, Aligns With Blockchain Game To Train Benevolent AI And Catalyze A Digital World Constitution

Below is a summary of my most recent post from Hot Springs, Arkansas. It just goes to show how fragile our perceived “reality” is in a world increasingly governed by the physics of information. It truly is a fascinating time to be alive. With digital media working overtime to keep everyone in their assigned “reality [...]

Below is a summary of my most recent post from Hot Springs, Arkansas. It just goes to show how fragile our perceived “reality” is in a world increasingly governed by the physics of information. It truly is a fascinating time to be alive.

With digital media working overtime to keep everyone in their assigned “reality tunnels” it may be hard to contemplate new wrinkles to the story that are so bizarre as to appear over-the-top spectacle. And yet, Elowyn, a blockchain Decentralized Autonomous Organization, with bottom-up game mechanics and in-game token economy linked to “peace,” “healing,” and “sustainability” has been designed with financial support from Ben Goertzel (the creator of Sophia the robot’s brain) and Michaela Ulieru (Singularity.net’s Chief Alchemist) to generate psychological data that will supposedly train “benevolent AI.”

Download White Paper Here: https://docsend.com/view/zfiqunqgxm98tbt4 Source: https://www.youtube.com/watch?v=BUy1YcYAzcA The leader of the Elowyn initiative, Anneloes Smitsman, based on the tiny volcanic island of Mauritius in the Indian Ocean, has also been involved with educational programs in partnership with systems theorist Alexander Laszlo and Russian transhumanist Pavel Luksha. Both are affiliated with Global Education Futures that has footprints, through Tom Vander Ark’s former ed-tech investment fund Learn Capital, on Powder Mountain, Utah and in Arena Hall in Austin, Texas. Source: https://wrenchinthegears.com/wp-content/uploads/2024/11/EvolutionaryLearningEcosystemsforThrivableFuturesCraftingandCuratingtheConditionsforFutureFitEducation.pdf Source: https://x.com/SingularityNET/status/1763227101426061353 The backer of Smitsman’s education initiative, piloted in several Catholic schools on the island, was backed by Investec, an Australian bank that is deeply involved in human capital bets on children. Smitsman has conducted research in South Africa and Australia, and both countries are hotbeds of blockchain impact finance. One of the advisors to Smitsman’s “Earthwise” organization and promoter of the archetypal card game to “fight Moloch” is Laura George, ordained priest of the interfaith order of Melchizedeck, human potential movement advocate, and leader of the “Peace Pentagon” situated along the ancient New River in the mountains of Virginia. Laura George and Glen Martin, philosophy professor at Radford College, are working to develop a worldwide constitution in coordination with Smitsman’s benevolent AGI program through the Peace Pentagon’s umbrella organization “The Oracle Institute.” Source: https://www.youtube.com/watch?v=tJj2RMceK_8 Source: https://www.grapevine.org/giving-circle/bpWhMbq/Oracle-Institute-EARTHwise-Centre-Global-Initiatives Source: https://earthconstitution.world/glen-martin/ Source: https://www.youtube.com/watch?v=pb5lypcN5jU Source: https://medium.com/singularitynet/governing-the-ungovernable-crafting-a-global-constitution-for-agi-b7485e44948f Imagine my shock that when pulling this thread I find Philadelphia housing rights activist, Cheri Honkala, listed as an advisor to George’s Institute. Cheri’s photo is featured in the same row as Anneloes Smitsman. When I contacted Cheri with documentation of the Elowyn DAO and the planned AGI world constitution, the only response I received was a brief text thanking me for the information. When I requested a follow up, so that I could accurately represent Cheri and the Poor People’s Economic Human Rights Campaign’s relationship to this effort I received no further response, which leaves me presuming tacit endorsement. Source: https://web.archive.org/web/20240804195108/https://www.theoracleinstitute.org/team Source: https://www.facebook.com/PeacePentagon/videos/vlog-14-cheri-honkala-and-the-woman-warrior-retreat/622965811518184/?locale=ms_MY&_rdr Source: https://www.facebook.com/OracleInstitute/posts/donate-now-httpsmightycausecomstorypeace-pentagon-skylightbarbara-marx-hubbard-v/10155598112917395/ Source: https://peacepentagon.net/about-our-community The beginning of this video also discusses author Robert Anton Wilson’s ideas, in partnership with Timothy Leary’s research on Eight Circuits of Consciousness, about “reality tunnels.” Reality Tunnels are the stories we live inside to make sense of the world. I personally am not comfortable with a reality tunnel where housing insecure people are sponsored by impact investors to acquire crypto card decks to fight Moloch, bend time, and train Sophia the Robot on blockchain while being subjected to behavioral hazing, but I suppose to some that model may seem appropriate. It certainly wasn’t what I signed up for when I worked with PPEHRC in Philadelphia to try and raise awareness about digital twins, blockchain, social impact finance and cybernetics. Source: https://wrenchinthegears.com/2019/10/20/pay-for-success-the-opioid-crisis-poor-peoples-economic-human-rights-campaign-radical-social-work-breakfast-presentation/ Jason Bosch also had close connections to Cheri and PPEHRC for many years. In fact that is how he connected with me after seeing a poor quality video I had made on my research topics during a PPEHRC-sponsored presentation at the Wooden Shoe bookstore in the spring of 2019. Throughout the lock downs Jason attempted to set up a conversation with Cheri Honkala about our findings on Web3 and our concerns about how the planned digital governance layer would impact poor communities and vulnerable populations. He was never able to pin Cheri down for a time to talk about it privately or on video. Source: https://www.youtube.com/watch?v=wQuOAMXyODs Looking back on my past experiences, I am having a very difficult time making sense of this development. I think most people lead their lives trying to have agency and make the world a better place. Does Cheri truly believe in gamified “benevolent AI” psychological profiling and a planned world constitution for universal “peace” even when it is clear that all of the players are aligned with the United Nations Sustainable Development Goals (the structure for cybernetic systems engineering of people and populations into recursive fitness landscapes to improve their human potential)? At this point I can only guess, because communication channels are jammed. Source: https://docsend.com/view/zfiqunqgxm98tbt4 As the events of this week unfold, remember there are many things happening in the background that won’t make it into the main current of the media feeds for years to come. Nonetheless it may very well be those things are are most important to comprehend as we navigate the unbelievable developments that are unfolding. At the end of the day was it just one big Live-Action-Role-Play performance art piece? I close scratching my head and thinking again about the inversions in the plot of John Fowles’ 1965, “The Magus.”  Source: https://wrenchinthegears.com/2019/06/26/pay-for-success-finance-preys-upon-the-poor-presentation-at-left-forum-6-29-19/ Addendum: After I published this post, which I sent directly to Cheri and the associate mentioned in the video, she texted me back to say that she was no longer involved with The Oracle Institute. I had hoped that she would have conveyed that information to me when I contacted her twice  last week to ask for clarification in advance of sharing this research publicly. It was not my intent to skewer Cheri or PPEHRC, but this project is too important to simply be set aside as something to be considered on another day, especially for people ostensibly organizing vulnerable populations at a grassroots level. I wrote back to Cheri stating that I would be very interested in reading her take, or listening to a video of hers, on Laura George’s organization now that she has the larger context related to Smitsman, Earthwise, and Elowyn. The Oracle Institute’s “Peace Pentagon” identifies as an eco-village, a concept with tentacles going back to Findhorn in the UK and the origins of the New Age movement. Cheri was Jill Stein’s VP running mate on the Green Party ticket in the 2012 presidential race. When I was associated with PPEHRC we drove to Chicago and attended a day-long event hosted by the eco-socialist contingent of the Green Party, including Kali Akuno of Cooperation Jackson (a non-profit that has a Fab-Lab funded by MIT) and David Cobb who is involved with Cooperation Humboldt’s solidarity economy. I personally would be interested in hearing more from Cheri or other Green Party affiliates about how the Smitsman, Ulieru, Goertzel plan to use crypto gaming to train “benevolent AI” may intersect (or not) with the party’s platform around cooperatives, the environment, and “peace.”  Source: https://www.gp.org/the_four_pillars

 

Source: https://www.gp.org/building_the_movement_for_eco_socialism Source: https://www.gp.org/kali_akuno_2021_gp_keynote

 

Saturday, 02. November 2024

IdM Laboratory

Okta AD/LDAP DelAuthモジュールに関する脆弱性

こんにちは、富士榮です。 Okta社よりAD/LDAP DelAuthモジュールに関する脆弱性が報告されていますね。 https://trust.okta.com/security-advisories/okta-ad-ldap-delegated-authentication-username/ 報告されている内容としては、AD/LDAP DelAuthモジュールを使っている環境で、過去にキャッシュログインに成功したことのある場合、ユーザー名が52文字以上だとパスワードなし(キャッシュのみで)ログインが成功してしまう、という話です。 こちら上記のサイトからの引用です。 On October 30, 2024, a vulnerability was internally identified in generating the cache key for AD/LDA

こんにちは、富士榮です。

Okta社よりAD/LDAP DelAuthモジュールに関する脆弱性が報告されていますね。


https://trust.okta.com/security-advisories/okta-ad-ldap-delegated-authentication-username/

報告されている内容としては、AD/LDAP DelAuthモジュールを使っている環境で、過去にキャッシュログインに成功したことのある場合、ユーザー名が52文字以上だとパスワードなし(キャッシュのみで)ログインが成功してしまう、という話です。

こちら上記のサイトからの引用です。

On October 30, 2024, a vulnerability was internally identified in generating the cache key for AD/LDAP DelAuth. The Bcrypt algorithm was used to generate the cache key where we hash a combined string of userId + username + password. During specific conditions, this could allow users to authenticate by only providing the username with the stored cache key of a previous successful authentication.

Note: A precondition for this vulnerability is that the username must be or exceed 52 characters any time a cache key is generated for the user.

2024年10月30日、AD/LDAP DelAuthのキャッシュキーの生成において、内部的に脆弱性が確認されました。キャッシュキーの生成には Bcrypt アルゴリズムが使用され、userId + username + password を組み合わせた文字列がハッシュ化されます。特定の条件下では、ユーザー名と、過去に認証に成功した際に保存されたキャッシュ・キーだけを提供することで、ユーザーが認証できる可能性があります。

注:この脆弱性の前提条件として、キャッシュ・キーが生成される際には、ユーザー名が52文字以上でなければなりません。


すでにOktaのプロダクション環境では解消されているようですが、本モジュールを利用している場合(特にAD連携をしている場合はほぼ必ず使っているはずのモジュールなので)は、ユーザ名が52文字以上あるユーザがいるかどうか、侵入の痕跡がないか、など確認しておいた方が良さそうです。タイムラインを見ると脆弱性のあるモジュールがリリースされたのが2024/7/23で、発見されて対応されたのが2024/10/30となっており気が付かないまま3ヶ月経過しているので。
そもそも論としてAD/LDAP DelAuthってなんだ?って人もいると思うので簡単に。 要するに、クラウド上にあるOktaへオンプレのADやLDAPのパスワードを使ってログインできるようにするモジュールです。
AD版はこちら https://help.okta.com/en-us/content/topics/security/enable_delegated_auth.htm LDAP版はこちら https://help.okta.com/en-us/content/topics/security/security_authentication.htm
ざっくりとした仕組みですが、Oktaへの認証要求があるとオンプレのAD/LDAPへ認証要求が行われ、成功するとパスワードハッシュのキャッシュがOktaクラウド側に置かれ、以降キャッシュが有効な間はオンプレ側への問い合わせなしにクラウド側だけで認証処理が行われる、という感じです。
すでに対応は終わっているとは言えなかなかな脆弱性ですね。。。 まぁ、ユーザ名が52文字以上って言うのもあんまりないとは思いますが。

Jon Udell

How To Create Software Diagrams With ChatGPT and Claude

Earlier efforts to diagram software with LLM assistance weren’t fruitful, but this time around things went really well. I ended up with exactly what I needed to explain the architecture of a browser extension, and along the way I learned a lot about a couple of formats — Mermaid and Graphviz — as well as … Continue reading How To Create Software Diagrams With ChatGPT and Claude

Earlier efforts to diagram software with LLM assistance weren’t fruitful, but this time around things went really well. I ended up with exactly what I needed to explain the architecture of a browser extension, and along the way I learned a lot about a couple of formats — Mermaid and Graphviz — as well as their tool ecosystems.

How To Create Software Diagrams With ChatGPT and Claude

LLM series at The New Stack


@_Nat Zone

VC(Verifiable Credentials, 検証可能資格情報)に未来は無いのか

今、Identity業界で話題になっているブログ記事があります。Sovrin ブロックチェーン上で分散ID/Verifiable Credentials (VC) のソリューションを提供していたTrinsic社の創業者ヒューズ氏の10月15日のブログ「Why Verifiable…

今、Identity業界で話題になっているブログ記事があります。Sovrin ブロックチェーン上で分散ID/Verifiable Credentials (VC) のソリューションを提供していたTrinsic社の創業者ヒューズ氏の10月15日のブログ「Why Verifiable Credentials Aren’t Widely Adopted & Why Trinsic Pivoted (なぜVCが広く採用されないか&Trinsic社がピボットしたか)」です。ざっとまとめると次のような感じです。

なぜVCが広く採用されないか&Trinsic社がピボットしたか Trinsicの方針転換の概要 IIW38にて「SSIは機能しなかった。Trinsicは方針を転換する」というセッションの発表。 聴衆から好意的な反応があり、拍手や生放送の要請があった。 業界のベテランたちによるセッションの重要性の認識。 主要な前提 前提#0:熱狂的な支持 – 著者は自己主権型アイデンティティ(SSI)の理念に引き続きコミット。 前提#1:用語 – 検証可能な資格証明(VC)や自己主権型アイデンティティ(SSI)などの用語をほぼ同じもの〜ユーザーセントリックに信頼できる形で属性を転送することができる技術の意味で使っている。 前提#2:市場の現実 – 市場は予測不可能;成功する製品にはプロダクト/マーケットフィットが必要。 前提#3:採用が重要 – 最高の技術は、その採用とユーザーへの影響によって定義される。 前提#4:時間の無駄を避ける – イノベーションを推奨しつつ、市場での既知の失敗に対して警告。 歴史的背景 2017年にSovrin財団から始まり、2019年にTrinsicが設立。 当初は数百人の開発者が登録する成功を収めるも、成功した顧客事例は500社に1社と限定的。 市場トレンドは検証可能な資格証明よりも独自ソリューション1を好む傾向を示した。 特定された課題 相互運用性の問題 – 検証可能な資格証明(VC)間での真の相互運用性の欠如;標準規格が普遍的に合意されていない。 ユーザー体験(UX)の懸念 – VCの初期ユーザー体験が既存のソリューションより劣っており、導入に消極的になっている。 普及の散在化 – 支配的なユースケースや地理的な一貫性の欠如により、ネットワーク効果が妨げられている。 将来の展望 政府の関与 – 政府の要請によってVC採用が促進される可能性、ただしこれは中期的な見通し。 タイミングと市場の準備状況 – VCが時期尚早という議論は、独自ソリューションの成功によって異議を唱えられている。 Trinsicの新戦略は、新しいデジタルIDの作成ではなく、既存のデジタルIDの受け入れを企業に支援することに焦点を当て、ID環境の断片化の継続を見込んでいる。 Dick Hardt氏の見解

この記事には好意的な意見が多いものの、課題出しとしては不十分との声もあります。たとえば、Dick Hardt 氏のこのLinkedInのポストなど。同ポストでは以下のようなことを指摘しています。

SSIと検証可能な資格証明(VC)、分散型アイデンティティが混同されてしまったのは残念。SSIの本質はユーザーにアイデンティティの制御を与えることだが、技術スタックにこだわりすぎている。 アマゾンを5年前に退職したとき、個人のデジタルアイデンティティの状況を調査したが、その時分散型アイデンティティはすでにかなりの期間存在していたものの、OpenID 2.0やOAuth 2.0、OpenID Connect (OIDC) のような急速な普及は見込めないと感じた。その理由は、OAuth 1.0と同様の問題—複雑で既存のパターンを活用していない—を抱えていたため。 VCの主な技術的な問題点: 暗号鍵の管理がユーザーに求められ、UXの課題や端末移行の障壁となっている 発行者の情報が開示される(例:年齢確認時に居住地までわかってしまう) 鍵の再利用によりユーザーの行動が追跡可能。これを防ぐためにバッチ発行などを模索しているがらなおさら実装も運用も面倒になっている 最大の課題はビジネスモデル: ユーザーはウォレットに支払わない 発行者はインフラ投資するが、価値を得るのは検証者 既存の個人情報販売ビジネスがVCに移行する動機が不足 バックエンドでの情報移動の可視性が確保できない

概ねわたしの意見と同じで、なんならヒューズ氏の記事に出てくる、故ヴィットリオ・ベルトッチ氏の見方も同じだったと思います。

では「どんな独自規格」が使われたのか?

また、ヒューズ氏の記事で解せないのは、なぜ OpenID Connectが出てこないのかです。独自規格を採用して成長したとされるitsme, YOTI, ID.me, PLAID は、実際にはOpenID Connectを使っています。独自規格では無いですね。

また、ChatGPTに代表されるAIもVCを使わずにOAuthに行ったとも書かれていますが、これらも OpenID Connect です。ヒューズ氏が言うように、市場の現実、採用が重要ということに鑑みればOpenID Connect はこれらのユースケースではより優秀な技術だったということになります。

ではVCに未来は無いのか?

では、VCに未来は無いのでしょうか?わたしはヒューズ氏ほど悲観的ではありません。しかし、彼の言う普及の散在化 は問題だと思います。OpenID Connect は、発行者も端末もオンラインのケースのみを扱うことによって技術スタックと運用を単純化しています。同様の、しかし違う部分への集中と割り切りが必要なのではないでしょうか。OIDCの方が優れているとわかったユースケース以外のところに集中するということです。今のVCは何でも解こうとし過ぎです。線路の上も走れる水陸両用自動車を作ろうとしているようなものです。

(図表1)レールの上も走れる水陸両用自動車

わたしがVCに長年求めているのは

発行者がオフラインになっても存在しなくなっても使い続けることができる 端末がオフラインでも使い続けることが出来る ユーザーがどのウォレットを使うべきかで惑わされない 公的資金で補助されない場合には発行者に検証者から利益が還元する

という要件を満たすユースケースに集中して技術スタックとエコシステムを確立することです。

加えて

発行者情報はグループ情報としてしか開示されない

ようになればなお良いです。これは発行者側がグループ署名を使うようになれば解ける問題ですかね>佐古研究室のみなさん

先ずはこの辺に集中すれば勝機も見えてくるのでは無いでしょうか。

あとね、以前から指摘しているように、クエリランゲージとかは劇的に単純化しないと駄目だ。Presentation Exchange とか複雑すぎる。OpenID のclaims syntax ですら複雑で普及しなかったんですぜ。DCQLでも複雑すぎると思う。DCQLの単純化した部分だけ(これでようやくClaims Syntax相当の複雑さ)で最初は十分。

まだいろいろ言いたいこともありますが、今日のところはこの辺にしておきましょう。

あ、そうそう、Trinsic社がいなくなったことと関係あるかわかりませんが、Sovrin ブロックチェーンも止まることが決定してしまいましたね。ブロックチェーンが永続性を担保しない良い例になりました。

それではまた!


IdM Laboratory

OpenID for Verifiable Presnetationsの新しいImplementer's draftのPublic Reviewが始まりました

こんにちは、富士榮です。 IIWや、その前日のOpenID Foundation Workshopでも取り上げられていたとおり、OpenID for Verifiable Presentationsの新たなImplementer's draftのPublic Review期間に入りました。 アナウンス https://openid.net/public-review-period-for-proposed-third-implementers-draft-of-openid-for-verifiable-presentations-specification-3/ 主な更新はこちら Introduces the Digital Credentials Query Language; this is an alternative to Presentation

こんにちは、富士榮です。

IIWや、その前日のOpenID Foundation Workshopでも取り上げられていたとおり、OpenID for Verifiable Presentationsの新たなImplementer's draftのPublic Review期間に入りました。



アナウンス

https://openid.net/public-review-period-for-proposed-third-implementers-draft-of-openid-for-verifiable-presentations-specification-3/


主な更新はこちら

Introduces the Digital Credentials Query Language; this is an alternative to Presentation Exchange Introduces the transaction data mechanism that enables a binding between the user's identification/authentication and the user’s authorization, for example to complete a payment transaction, or to sign specific document(s) using QES (Qualified Electronic Signatures). Removes the client_id_scheme parameter and instead makes the client id scheme a prefix on the client_id; this addresses a security issue with the previous solution.

デジタル・クレデンシャル・クエリ・ランゲージ(Digital Credentials Query Language)を導入。 トランザクション・データ・メカニズムを導入し、ユーザの識別/認証とユーザの認可の間のバインディングを可能にする(例えば、支払いトランザクションの完了や、QES(Qualified Electronic Signatures)を使用した特定の文書への署名など)。 client_id_schemeパラメータを削除し、代わりにクライアントIDスキームをclient_idのプレフィックスとする。


今後のスケジュールはこちら

Implementer's Draft public review period: Friday, November 1, 2024 to Sunday, December 16, 2024 (45 days) Implementer's Draft vote announcement: Tuesday, December 3, 2024 Implementer's Draft early voting opens: Tuesday, December 10, 2024 Implementer's Draft official voting period: Tuesday, December 17 to Tuesday, December 24, 2024
年内に承認されそうですね。実装中の皆さんは対応方針を考えておきましょう。

@_Nat Zone

インターネットアイデンティティワークショップ(IIW)ウィーク終了

前週のSIDI Hub Tokyo から始まった怒涛のイベント3週間の第2週目、IIWウィークが終了しました。 OpenID Foundation ワークショップ まず最初は、月曜日の午後のOpenID Foundation Workshop です。 例によって富士榮さんがすでに…

前週のSIDI Hub Tokyo から始まった怒涛のイベント3週間の第2週目、IIWウィークが終了しました。

OpenID Foundation ワークショップ

まず最初は、月曜日の午後のOpenID Foundation Workshop です。

例によって富士榮さんがすでに記事を書かれていて足すことは特にありません。

» IdM実験室: OpenID Foundation Workshopクィックレビュー https://t.co/918adtcuhk

— Nat Sakimura/崎村夏彦 (@_nat) November 2, 2024

このあと、理事会、理事ディナーでした。ディナーのメニューだけ上げておきます。

IIW 39

IIWは火曜日から木曜日までの3日間。こちらもだいたい富士榮さんと被っているので、IdM実験室を読めば良い感じ。

» IdM実験室: IIW 39 Day1クィックレビュー https://t.co/ctz2oJkkpQ

— Nat Sakimura/崎村夏彦 (@_nat) November 2, 2024

» IdM実験室: IIW 39 Day2クィックレビュー https://t.co/1otMtixF8d

— Nat Sakimura/崎村夏彦 (@_nat) November 2, 2024

» IdM実験室: IIW 39 Day3クィックレビュー https://t.co/2cDZE4JFre

— Nat Sakimura/崎村夏彦 (@_nat) November 2, 2024

被ってないのは、

(1-B) OAuth 101 (Aaron) (2-B) OpenID Connect 101 (Mike) 2 (3-A) SD-JWT VC over proximity/offline (Lee, Cam, Torsten, John, Oliver, Kristina Yasuda) (6-N) FAPI 101 #openbanking #opendata(Daniel, Joseph, Nat) (8-I) Why is the OpenID Foundation hopping right now? An overview of the 14 work groups and community groups on now. (Nat Sakimura + Gail Hodges) (9-A) RP Authentication & Authorization (EUDIW) (Torsten Lodderstedt, Giuseppe, Dima)

くらいの感じ。101や (8-I) は書く必要も無いと思うので、(3-A)と(9-A)についてだけ簡単に書いておきます。

(3-A) SD-JWT VCの近接・オフライン提示(Lee, Cam, Torsten, John, Oliver, Kristina Yasuda)

SD-JWT VC over proximity/offline

課題感:ウォレットがオフラインの場合のSD-JWT VC提示方法がない。

Offline (wallet)Over the internetmdoc18013-5OID4VP (incl. Browser API)SD-JWT VCWhat do we do here?OID4VP (incl. Browser API)

オプションとしてありそうなのが以下のような感じ。

Extend ISO 18013-5deviceRequestExtend ISO 18013-5OID4VP requestOID4VP over BLEOID4VP with CTAPStandard bodyISO?ISO?OpenID?FIDODevice engagement / Channel establishmentQR or NFCBLE?QR or NFCChanges requiredExtend deviceRequest / ResponseAdd OID4VP request / responseNone?NFC needs to be added (Happening anyway for cross device flows)Deployment considerationsAligned with 18013-5 existing deployments 
Not aligned with OID4VP online presentationAligned with 18013-5 existing deployments 
Not aligned with OID4VP online presentationAligned with OID4VP online presentation and existing CTAP deployments.
Not aligned with 18013-5 existing deploymentsFeature parity betwen online and offlineNNYLive implementations Y YY Y?? Y growing VERY fastFormatCBORJSONJSONBuilt at App levelApp levelApp levelOS or app levelMigrationNot required?RequiredRequiredReliabilityYYNYStandard extension in can be done in ISO or outside of ISOStandard extension in can be done in ISO or outside of ISOCouldn’t use ISOSecure tunnel between 2 devicesCan send arbitrary Invocation:QR code goes through the cloud; NFC is possible but was teken outQR code + BLE?Future UWB is possibleCTAP is available on almost all Android devices

OID4VP with CTAPがわりと有望そう。

考慮事項として

Bluetoothのセキュリティと安定性の欠如 ISOのプロトコルを拡張する際のIPRの問題

などが取り上げられていました。

(9-A) RP認証と認可 (EUDIW) (Torsten Lodderstedt, Giuseppe Di Marco)

RP Authentication & Authorization

なぜRP(Relying Party:依拠当事者)の認証が重要なのか

デジタルIDシステムにおいて、RPとの安全な関係を確立することは以下の理由で不可欠です:

法的な透明性の確保(法律で義務付け) 必要に応じた通信や法的手続きの実施 データ要求の範囲が適切かどうかの確認 認証と認可の仕組み PID(個人識別データ)とEAA(電子属性証明)の取り扱い プロバイダーは開示ポリシーを通じてデータアクセスを管理 ポリシーはRPの役割や権限証明と一致する必要がある ユーザーの informed decision(十分な情報に基づく決定)を支援 実装オプション X.509証明書 OpenID Federation SD-JWT(Selective Disclosure JWT)による証明 イタリアの事例:OpenID Federationの活用

イタリアでは、以下のような階層的な信頼モデルを採用しています:

各事業体は自身の設定を宣言 イタリア連携→EU連携という階層構造 信頼チェーンによるリアルタイムのポリシー管理 トラストマーク(信頼の印)による認証 特徴的な点 エンティティIDによる信頼性の確認 DCQLによるクエリ言語の標準化 第三者による信頼性の付与 考慮すべき課題

特に注目すべき点として、すべてのRPが法人格を持つとは限らないという現実があります。例えば:

国際的な研究協力プロジェクト 複数の司法管轄区域にまたがる組織 その他

その他もちろんサイドミーティングもありました。

今は公開できない秘密の話(水曜日夜) EUDIWのsub値などについて(木曜日) Anoop Saxana FAPI 共同議長とのCFPB対応のミーティング(木曜日夕刻) BGINについてのディナーミーティング(木曜日夜) Drummond Reed とのGlobal Acceptance Network についてのミーティング(金曜日のブレックファストミーティング) Edmund Jay とのミーティング(金曜日昼) などなど

なお、今回のIIWの最終退出者はわたしです。証拠の写真がこれ。

Anoopとリモートで会議してる間にみんなビール飲みに行っちゃったんです。

それではまた来週!

Friday, 01. November 2024

IdM Laboratory

IIW 39 Day3クィックレビュー

こんにちは、富士榮です。 いよいよIIWも最終日です。 かなり疲れてきたので頭が回らず英語が聞き取れなくなってきました。(来週いっぱいまで英語生活なので週末はゆっくりしないと・・・) ということで今日も楽しい話題がいっぱいでした。 Credential Status Comparison Mechanisms - Paul 以前も書いたStatusList周りを含むクレデンシャルの取り消しに関する話題です。 確かに色々な方法があるんですよねぇ。古き良きOCSPとか。。。 そもそもクレデンシャルの取り消しをしたくなるトリガーとしては、こんなことが挙げられます。 Credentialデータセットが無効(例えばover18になった) 発行システムの侵害 アルゴリズムの危殆化(量子コンピューターの発展など) ウォレットや鍵の侵害 ドイツ政府はイタリ

こんにちは、富士榮です。

いよいよIIWも最終日です。


かなり疲れてきたので頭が回らず英語が聞き取れなくなってきました。(来週いっぱいまで英語生活なので週末はゆっくりしないと・・・)

ということで今日も楽しい話題がいっぱいでした。


Credential Status Comparison Mechanisms - Paul

以前も書いたStatusList周りを含むクレデンシャルの取り消しに関する話題です。

確かに色々な方法があるんですよねぇ。古き良きOCSPとか。。。


そもそもクレデンシャルの取り消しをしたくなるトリガーとしては、こんなことが挙げられます。

Credentialデータセットが無効(例えばover18になった) 発行システムの侵害 アルゴリズムの危殆化(量子コンピューターの発展など) ウォレットや鍵の侵害

ドイツ政府はイタリアやフランスと協力してシナリオと方法論の比較検討を進めているみたいです。考えているシナリオはこんなところだそうです。

ユーザが死んだ ユーザが自らRevokeしたい(スマホなくした) キーストレージが脆弱性にさらされた

この辺りをスケーラビリティとプライバシー(Linkability)を考えつつ解くのは難しい問題ですよねぇ。。



こんなクライテリアで比較しているそうです。

Credential format(対応している種類) Technology readiness level(成熟度) Status tracking by verifier(トラッキング) Holder tracking by issuer(トラッキング) Unlinkability(紐付け可能度合い) Complexity(0が高い) Scalability(0が高い) Offline/caching(オフラインシナリオへの対応)


色々と考慮点はあって、例えばBitStringでStatusListを作るとある程度サイズ圧縮は効くもののの上限はあるので、一定期間でガベージコレクションをしたり、クレデンシャルのバッチ発行をすることでリンク可能性に対策できるかどうか考えたり、、と。

しかし、こう言う比較検討の段階でIIWのようなプロフェッショナルが集う場所へ投げ込んでブラッシュアップするって方法、本当にいいと思うんですよね。プレゼンする方も「こう思うんだけどどう?」って投げかけてちゃんとその場で議論になる。これを政府や金融機関の中の人たちがちゃんとオープンな場でやっている、ってなかなか日本では考えられないことだと思います。日本政府もこう言う場に来ればいいのに(チラッ)


Digital Credentials API - Tim

次はDigital Credentials APIの話です。

カスタムURLスキームでウォレットが起動されるのではなくFIDOと同じようにCredentials APIで起動できるようにしましょう、って話です。

そもそもカスタムURLスキームで何が問題だったか、というと

安全ではないコンテキストでアプリを起動 デバイス上でのフィッシング(悪意のあるアプリを選択) リクエスターオリジン、アイデンティティがない

ということなので、この辺りはパスキーから学ぼうよ、という話です。この辺りが学びとしてあげられていました。 呼び出し元のコンテキストがキーとなっている クロスデバイス認証はセキュア・簡単・フィッシング耐性があることが必要 こんな構造です。
FIDOにおけるCTAPとWebAuthnとよく似てますね。
こんな感じで動くようです。
APIも割とシンプルです。
Presentation let digiCred = await navigator.credentials.get({ signal: controller.signal, digital: { requests: [{ protocol: "openid4vp", data: "{request json}" }] } });

Issuance

let digiCred = await navigator.credentials.create({ signal: controller.signal, digital: { requests: [{ protocol: "openid4vci", data: "{request json}" }] } });


この辺りでdemoが見れますよ。

https://digital-credentials.dev/


OID4VCI Browser API Issuance Profiles - Joseph, Kristina, Paul

色々なシナリオを網羅的に考えていかないと社会基盤としては使えないということでかなり色々なユースケースを考えていますね。

https://bmi.usercontent.opencode.de/eudi-wallet/eidas-2.0-architekturkonzept/flows/Presentation-During-Issuance/

こちらにあるようなPresentationの結果を受けてIssuanceを行う、というシナリオもその一つです。

例として、学校でDegreeを発行してもらう際にmDLを提示して本人確認をする、みたいなシナリオが挙げられていました。

Presentationの結果を受けてPre-authorization用のコードを発行〜そのままOID4VCIを走らせるってことですね。


これもプレゼンテーションを行うクレデンシャルが入っていたWalletとIssuanceを行う先のWalletが異なるケースにどうやって対応するのか?みたいな話で議論が盛り上がりました。


OIDC4VCI Credential Versions / Purpose on presentation - Oliver, Daniel

こちらが最後のセッションでしたが、またマニアックな話題でした。

前半パートはOliverがクレデンシャルにバージョンをつけるとするとどうなるのか?という話をしていました。(何をもってそのバージョンが最新であるか、を証明することも含め)

まぁ、バージョンと言っても色々な解釈があるのでその辺りの整理をして議論をするところまで、となりましたが。

同じクレームとバリューを含んでいるが、別デバイスの鍵とバインドされていたり、別で発行された クレームは一緒でも値が異なる。値が変わった場合、Frequent Flyer Statusが変わった場合など クレームは一緒でも値が異なる場合。なぜなら別の人やものに関するもの クレームも値も異なる。全然別のクレデンシャルなので

誰がトリガーで発行し直すのか、Shared Signalで通知するのが良いのか、、などなど。。
後半はDanielがPresentation requestの中のクエリ言語(PEやDCQL)に指定するpurposeについての話題。(これもコアだな)
現状はクレデンシャルの単位でpurposeを指定するので、クレデンシャルセット全体に関するpurposeってないよね、とか多言語対応させるにはどうするべきなのか?みたいな話題で盛り上がりました。こちらも時間切れ。

ということであっという間の3日間でした。
とりあえず次はIETFということでこれからドイツ経由のダブリンへ移動します。。。







Michael Ruminer

VSCode and Debugging Python in Virtual Environments

AI generated image of a beetle This post is really for me more than anyone else. I will be able to come back here and find it. If it helps you as well, then great. I have historically not been a Python programmer. Additionally, I have historically used Visual Studio versus VSCode. Lately, I have been doing Python in VSCode. I need to learn my way around. I recently found how to deb
AI generated image of a beetle

This post is really for me more than anyone else. I will be able to come back here and find it. If it helps you as well, then great.

I have historically not been a Python programmer. Additionally, I have historically used Visual Studio versus VSCode. Lately, I have been doing Python in VSCode. I need to learn my way around.

I recently found how to debug Python with breakpoints and watches in a virtual environment, or at least in the default “-m venv” virtual environment, much as I would in a different language in Visual Studio .

You can have the debug window create a launch.json in the .vscode folder of the root workspace directory or create it yourself from scratch. Here is one of my modified ones.

{
// Use IntelliSense to learn about possible attributes.
// Hover to view descriptions of existing attributes.
// For more information, visit: https://go.microsoft.com/fwlink/?linkid=830387
"version": "0.2.0",
"configurations": [
{
"name": "Python: crewai series day_04",
"type": "debugpy",
"request": "launch",
"program": "${file}",
"console": "integratedTerminal",
"env": {
"PYTHONPATH": "${workspaceFolder}\\crewai_series"
},
"cwd": "${workspaceFolder}\\crewai_series\\day_04\\src"
},
{
"name": "Python: crewai series day_05",
"type": "debugpy",
"request": "launch",
"program": "${file}",
"console": "integratedTerminal",
"env": {
"PYTHONPATH": "${workspaceFolder}\\crewai_series"
},
"cwd": "${workspaceFolder}\\crewai_series\\day_05\\src"
}
]
}

The real trick is to set that PYTHONPATH property to your folder with the virtual environment folder as a top-level subdirectory.

Bonus: you set different debug profiles like I did above and shift between them in the debug window. That is helpful when you have different virtual environments in the same workspace.

That’s it. That’s the post.

Thursday, 31. October 2024

Doc Searls Weblog

The Kids Take Over

While I am extremely pleased and grateful that 26 years of writing on Linux Journal survive online without being 404’d, I also realize that this condition probably won’t last forever. Also, some pieces are now missing their images and other graces. This is one of them. It is also one of my best, I think, Or […]

While I am extremely pleased and grateful that 26 years of writing on Linux Journal survive online without being 404’d, I also realize that this condition probably won’t last forever. Also, some pieces are now missing their images and other graces. This is one of them. It is also one of my best, I think, Or at least one of the most important. This is a late draft, rather than the published final text, but it’s still good. Read on and enjoy—Doc

It starts here, in the heart of Long Island, a couple dozen exits east of Queens. I saw it with my own eyes in Mineola’s Public Schools, where kids, led by a nonprofit called kidOYO (“Kid-oy-yo”), are learning to program in different languages on different computers and operating systems, creating and re-creating software and hardware, with fun and at speed. Their esteem in themselves and in the eyes of their peers derives from their actual work and their helpfulness to others. What I saw was also sure to strip the gears of any system meant to contain them. Mineola’s schools were not among those.

OYO means Own Your Own, and that’s what these kids are learning to do. In geekier terms, they are rooting their own lives online. They’re doing it by learning to program in languages that start with Scratch and move up through Python, Java, C# and beyond. They’re doing it on every hardware and software platform they can, while staying anchored to Linux, because Linux is where the roots of personal freedom and agency go deepest. And they’re doing in all in the spirit of Linus’ book title: Just for fun.

With kidOYO, the heuristics go both ways: kidOYO teaches the kids, and the kids teach kidOYO. Iteration is constant. What works gets improved, and what doesn’t gets tossed or replaced. The measures of success are how enthused the kids stay, how much they give and get energy from each other, and how much they learn and teach. Nowhere are they sorted into bell curves, given caste-producing labels such as “gifted” or “challenged.” Nor are they captive to the old report card system. When they do take standardized tests, for example the college AP (advanced placement) ones for computer science, they tend to kick ass. (At that link we see how nearly all the 6th-9th graders who took the test passed, meaning they were ready for AP college work.)

kidOYO is the creation of the Loffreto family: Devon, Melora, and their son Zhen, who is now 12. What started as a way to teach computing to Zhen turned into ways to teach computer science to every kid. Their methods resemble how the Linux kernel constantly improves, stamping out bugs and iterating toward ever-expanding completeness, guided by an equal mix of purpose and fun.

Before we met, I had assumed, from Devon’s writing style and deep knowledge of stuff, that he was a gentleman perhaps of my own age, or even older. So I was surprised to find that he was not only a youngish guy, but a New York state high school champion baseball and basketball player who went to college on a sports scholarship. Also that he looked a stunt double for George Clooney.

I also knew what he and kidOYO were doing was important. But my mind wasn’t blown until I finally obeyed Devon’s invitation to see their approach at work. That happened on Groundhog Day in February. (An album of pictures I took on that visit is on the Linux Journal Flickr site here.)

Mineola is about as prototypical as a middle class New York suburban town can get: a 2-square mile village of about 20,000 in the center of Nassau County, which lays between Long Island’s north and south shore and is home to about 1.5 million people. The Mineola Free Union School District, however, is anything but typical. I’ve never seen a public—or any—school system with its feet equally planted in the digital and the physical worlds, or as eager to run forward in both. For example, all three schools I visited had created social and hacker spaces within their libraries. The books and the stacks still mattered, but so did the ability of kids to research, learn and teach together using computing and related gear, such as 3-D printers and programmable robots.

Standing in the Coding Center at the Mineola Middle School, surrounded by kids doing amazing stuff on their Chromebooks, Dr. Michael Nagler (@naglersnotions), superintendent for the district, gave me the backstory on how kidOYO got involved.

“Three years ago my wife signed our son up for a coding class these guys were putting on,” he said. “So I drive my son out there, and I’m watching what they’re doing, and I’m impressed. I ask Dev, ‘Why aren’t you in schools?’ He says, ‘The schools won’t talk to us.’ So I say, ‘Well you’re in luck, because I know a guy.’ We worked to help adapt their platform for schools, starting with ours. And I mean all of ours. We jumped in the deep end, starting with the little kids first and pushing it up through high school. And now we’re on this three year journey, so far, during which everything changes. Constantly. The little ones get the skills, and they roll up. Now I have to adjust my next level, and do it waaay faster than I have to with any other curriculum. Right now, for example, for the AP Computer Principles course in high school, they’re doing the learning path for (hatchcoding‘s) Hatch 1 and Hatch 2. Meanwhile, my sixth graders are already finished with it. So by the time these sixth and seventh graders get to ninth grade, my expectation is that every student in the district is taking AP Computer Principles. That’s going to replace our Exploring Computer Science class. And then we build in connections. So we’re doing Arduinos here in the Middle School’s sixth grade, and simultaneously in ninth grade in the high school. Then, as the younger kids move forward, we’ll change the ninth grade setup.”

Since Maker Faire New York is a great place for kids from everywhere to show off their maker chops, I asked Dr. Nagler if they had plans for that.

“We merge CS and computational thinking with making. We have a whole design and creative thinking framework tied to our mascot, the mustang. We make ways for the kids to conceptualize, design, iterate, prototype, test, refine, go, back, and build things.”

I asked, “How do you deal with the combination of kids who are already on this path, kids who want to come in and need to catch up, and eventually everybody in the school doing AP level work on computers? And beyond that, how does this whole thing catch fire?”

“A couple of ways. First, it’s not an elective. Here in Mineola, every kid has to do it. They also have to do it in their subject classes. So we tie a coding project to a curriculum project. Every grade has to do three a year. So we teach a language that way. We teach it independently the OYO way. And we teach it the formal way, cycling kids through CS classes, for example here in this room. As for catching fire, we’re Ground Zero. We succeed here and it spreads to other districts.”

“How do you all scale? I mean both what you’re doing and what kidOYO is doing?”

“I think we’re unique in that I don’t want it to be a formal class. I want CS to be ingrained in everything we do. In the process we’ll refine it and share it in ways that can be adopted by other districts. I’m a big open source guy. Sharing is key. So I’m taking the kidOYO platform and building an open computer science curriculum in social space. The beauty of their platform is that it lets me build an OER—Open Educational Resources—using their concept of learning paths, which we also work on together. Dev also built me a website that I can send to an organization I belong to called the League of Innovative Schools, which is a national organization. We can crowd-source content there. For example I built a sample curriculum unit. I can push that out to various states. By crowdsourcing we already have a ton of content on there.”

At this point Devon joined the conversation. “Tell Doc about MC².”

“Right. It stands for Mineola Creative Content, and it’s a video production studio, where we do fun learning videos, which are a basis for the learning pathway here.”

The opening text on the site (https://mc2oer.oyoclass.com/) explains, “This community showcases open educational content and other materials from the Mineola School District. Mineola is a suburban district located about 30 miles outside of New York City. Our school district is dedicated to the #GoOpen movement which supports sharing educational resources.

“It’s all about #OER—Open Educational Resources—and open source,” Dr. Nagler explained. “We use the videos here in the district, and also throw them out to the world where everybody can use them.”

Look up “Dr. Nagler” on YouTube, and you’ll find lots of them. He’s the star, as both a mentor and an animated character. There’s even one video where talks with his animated brain, bearing his signature goatee.

“An important context is that there is no central repository of educational materials in this country, because they’re all locked up by proprietary publishers. What we’re doing here is a way to get around that. And I have a lot of flexibility. I can market MC² as a school district entity, and not worry about all the copyright crap. It’s all made to share.”

I asked, “What happens to the world when these kids graduate into it?”

“They’re going to change the world. That’s clear. We’re also all dealing with astronomical change in the technical environment along the way. Constantly. This makes everything very hard to predict. Look at my 2019 high school graduates. They started Kindergarten in 2006. Even from just 2006 to 2009, the technology advances were astronomical. And then look what happened in the next ten years. Huge. So if I start planning now for where Kindergarten kids will come out at the end of the next twelve years, I’m already lost. But if I trust the process we have in place already, I’ll be fine. We’re driving it, and the kids are driving it too. It’s a constant cycle.”

I replied, “We also live in a world where giant companies are also working to contain those kids’ agency inside corporate silos. Some of those silos also spy on everyone constantly. How do you deal with that?”

“The common denominator is CS, and the flexibility within it. There’s freedom in that. I’m not going to force you to master, say, just one language. I’m going to get you on a platform where you can play with any and all of them, learn quickly and well, and apply whatever language you like toward building something. And because we’re merging the making and the coding, your next question will be, ‘What will this code do?’ The answer is, computational thinking will always push you toward solving problems. If you look at the big picture, content already is readily available to every kid. And content has always been our specialty, as a school. But with CS, the kids learn to master that content, in many ways. That’s key. Kids need to know and feel that they’re on top of things. That they Own their Own.

“What about curricular necessities: mandates that come down from the federal and state level?” I asked.

We’re still a public school, and we do have formalities. For example, here in New York every kid has to pass the state Regents Exam. We teach to that, but we also make sure there’s no way a kid graduates without exposure to computer science.”

“And you trust that’s going to equip them, once they’re out.”

“It’s more than that. Working with kidOYO, we’ve developed something that not only should be replicated everywhere, but needs to be. Here’s the important thing: there aren’t enough people who know computer science who can also teach it. So when you figure out a way to virtually do it, to scale the knowledge outward for everybody, it’s a big deal. The investment I make here probably cost me one teacher’s salary. But it scales to the whole district. To scale evenly, you can’t beat it. In fact it’s the only way to scale up computer science through schools, because the current credentialing system is too slow, and too top-down, and formal training is too slow, too far behind the curve. The kids and their mentors are moving too fast for that.

The biggest absence I saw that day was anything that looked like a bell curve. the system here clearly

What I saw here was a system that doesn’t have a bell curve, with A+ stars at one end and failures at the other. The system is made so every kid progresses, and every kid succeeds. A few minutes earlier, Melora explained, “We have no lowest common denominator, because everyone succeeds. There are twelve-year olds in this program that a 7th grade teacher wouldn’t look twice at in an ordinary classroom, but proves into her future as a profound programmer. And choice is key. When Dr. Nagler brought in this program, it wasn’t just for a select few kids. He wanted it to be open to everybody. And everybody has the ability to chose anything they want. It’s a totally different ecosystem than you’ll find anywhere else. And he’s gracious enough to reach out to other school systems to help them break down their own classroom walls. One of the things he preaches is that you have to believe. That’s a requirement of being on the cutting edge. The failing forward principle works for everybody too. It’s a model that works.”

The spirit of helpfulness and failing forward also fosters kids’ confidence that they can weigh in with solutions of all kinds. To show me how that works, Devon took me over to a table where Jordan Chaver and Connor Scott, a sixth and seventh grader, were working together on something.

“These two guys,” he said, “are your app builders. They came with us out to Stony Brook University for some of our software program there. Jordan pitched them on building an app on iOS, which he already knew how to do. But there was not a single mentor in the room that knew what he was trying to do—. legitimately, because in university CS they don’t want to work in a closed environment. So we transitioned the challenge over to the Web: to instead make a Web based app with database functionality. And that’s what these guys are building. And there isn’t just one app. There’s one they call social-emotional. And another called Class Dash. Asked to demo one, Connor pulled up a Chromebook, angled it toward me and said, “Let’s say you have a research paper. One that’s big and complicated. And you press Submit. Behind this you have something kind of like Dropbox, where you can share documents.”

Devon jumped in to explain, “They’re sharing all their class assignments in a firewalled white spaced environment where they don’t have access to their emails. So this is a simple way of sharing inside that environment.”

Connor continued, “You also have this five-character ID code. Jordan can type in the code, and he gets the same exact document. So can anyone else with the code. The idea is to share something with the class in a way that avoids complications. We’re also in a class play, Once Upon a Mattress, which is based on the Princess and the Pea. I’m the Price and Jordan is the Wizard. So Jordan made this schedule for all the performances, where you can buy tickets, and so on.” On his Chromebook, Jordan showed me his page with the schedule next to a graphic of the play’s title. He then gave Connor the five-digit code for access to the schedule, and it came up on the Connor’s Chromebook. (A picture of that is here.)

Connor again: “Right now I’m adding a way to lock a document. Let’s say that Jordan is the teacher and he finds a spelling error in my document. I’ll add a button you can click on and see if anybody has updated the document.”

Jordan said, “Let me tell you more about Class Dash, which I did for Stony Brook. It’s a student-teacher companion app. It has multiple uses, but the one that’s currently available is called Schedule. It covers notes, teacher, room, and supplies. I play drums, so drumsticks are an example of supplies. I also have Instant Messaging Teacher. The idea is, if you have a homework question, instead of emailing the teacher and getting a response the morning after, the teacher gets a push notification on their phone.” Class Dash will first hit the market in April as an iOS app. Other versions will come after that.

Joseph Malone, also twelve, is at the same table, hacking AI algorithms. Devon said, “Joseph here is spinning up his own virtual machine and generating algorithms to train his AI to run his scripts. He’s going into OpenAI, playing with AI algorithms, modifying them, and putting them to use. It’s neat stuff, and it’s also huge.” Melora told me Joseph is also helping out by volunteering a stream of challenges, solutions and badges for kidOYO courseware. “He does all the work himself, and makes it open and available to everybody.”

“We’re fully networked here,” Devon adds. “No need for back-end support.” Meaning no external corporate dependencies. kidOYO and its participants—learners (they aren’t called students), mentors (they aren’t called teachers), parents, schools—all work together, and for each other, as a “community of communities.” They’re also not moving at the speed of anybody’s clock, or anybody’s class. Though they’re sure to change the world, that’s not the goal. In fact, there is no long-term goal. The journey is truly the reward, and the journey is called the learning path. That’s what matters, and its not seen, or built, as a way to plow through the status quo. Even though that’s one of the things it does. Neither Mineola nor kidOYO want to burden kids with anything at all, other than the need to master their digital worlds, and to constantly advance their mastery.

The Middle School was the second one we visited in Mineola. The first was Hampton Street School, which is Pre-K to 6th grade. There we saw clusters of five and six year old girls and boys in the library’s Coding Center, hacking away on on school-issued tablets using Scratch, which is free (as in both liberty and cost), open source and runs on anything. They were also doing this both by themselves and collaboratively.

The way kidOYO works, all the kids know they are working both to expand their own skills and those of other kids as well. There are also rewards along the way, such as on-screen fireworks and badges. After a bit of working on their own, the kids’ work is shown on a screen for review by each other and Melora, their mentor. (The learner/mentor relationship is central to the kidOYO system, and practiced in the Mineola school system as well.) Devon later explained what was going on: “Melora was reviewing the process of getting challenge submission feedback from mentors, as well as introducing them to a new app called Sprite Editor that we recently released for kids to create art they may want add to their Scratch, Python or Web-based projects. Often it’s their own video game character art.”

When one boy failed a particular challenge, he embraced it, knowing that FAIL means “first attempt at learning.” Three girls came over to help the boy out. It was interesting to watch how they knew their job wasn’t to jump in with the right answer, but to help the boy learn what he didn’t know yet, so he would have the satisfaction of succeeding for himself. This was a far more sophisticated and mature than I would normally expect of kids so young. Instead I would have expected kids that age to show off what they knew, or to one-up each other. But that’s not how the kidOYO approach works.

Have you ever played the red/black game? I remember it as an exercise the human potential movement used to teach in weekend retreats and workshops to show there’s more to be gained from cooperation than from competition. The idea behind the game is to reprogram adults so they value cooperation at as well as competition. My point in bringing it up is that it’s hard to teach adults how to deal with each other in ways that are as empathetic, helpful and vanity-free as what I saw as normal behavior among these little kids.

At Hampton Street, Devon spent most of his time working with a second grader named William Ponce, who was clearly grooving on what he was doing. Later, Devon wrote to explain what was going on:

Here is William Ponce’s portfolio. Every kid has one. You can see badges he has earned. If you click on one of his “Mastery Badges” you will see the “Learning Pathway” that he navigated in earning it, displayed as evidence in the badge. Clicking on the micro badges in evidence will show you the badges earned on way to mastery badge.

Here was helping William earn his first Mastery Badge. Since we left that class, you can see he has earned two more already!!

Our third stop was Mineola High School, which has a fab lab and manufacturing facility. “We actually source product from them,” Devon told us on the way over. “For our store. Coding is the underlying infrastructure, but it’s applied everywhere.”

The Fab Lab is beyond impressive. It’s as big as a lumber yard and has lots of machinery, materials, and students making stuff. Ken Coy, who runs the lab, explained, “We do it all. Welding, electronics, coding, Arduino, hand tools, computer tools. We bring it all together here. We have all the old traditional tools you’d have in wood shop days—drill press, band saw, lathe, tools for sanding—plus all the new stuff that’s both manual and computer controlled. Large format printers, laser cutters…”

When I asked him about Linux, he brought me over to the shop’s Linux CNC (Computer Numerical Control) computerrunning on Ubuntu and attached to a Probotix controller and a router. (Not a network router, but a powered workworking tool that cuts with bits or blades.) In the design class space, Andrew Woolsey (@WoolseyDesigns) showed me a CNC controlled laser cutter where the students were tracing, carving and printing out parts for art projects, signs and much more (which occupied students on adjacent tables). He also showed me a printer as wide as a piano churning out student portraits and posters of amazing quality, including ones for the Mineola Robotics Team (@mineolarobotics), which is apparently (judging from the awards and posters) always competitive. I don’t often see stuff that makes me wish I was fourteen again, but Mineola High School did the job. Walking around the fab lab, the library and the halls, I didn’t see a kid who wasn’t upbeat and engaged, or a teacher who wasn’t the same.

My mind blown, I followed up that one-day tour by asking Devon and Melora a lot of questions that amounted to an interview. Here it is.

DS: How much about Linux, including, eventually, operating in command line mode, do kids pick up, and at what stage? I also want to know how kids learn to be masters across platforms: from Linux to Mac, Windows, Android and iOS, because I saw and heard all of those discussed in the schools we visited.

DL: It varies by program type and mentor interaction. In schools, its limited. While schools are using Chromebook computers which can be dual booted into a Linux Ubuntu environment, this is generally disabled. Some schools will use Raspberry Pi’s that will allow for introduction to Linux OS, but its still rare. To some degree they are limited by the gear they get, and the familiarity with Linux of the mentors. But we support Linux with curriculum whenever it’s possible. And we do our best to push schools in the Linux direction.

In kidOYO programs we run in our own community, outside of school classrooms, Linux engagement is higher. Here it scales with complexity of lessons and the skills of students and mentors. This past semester, our program was hosted at Stony Brook University, and we had ten to sixteen year old students learning to create projects within Linux Ubuntu and GNU Linux OS, as well as Mac plus Windows by installing Windows sub-systems for Linux. They can set up their dev environment with basic shell commands, using Vim, gradle, flask, etc. Skills and proclivities on this path vary widely, but I will say that outside of game development skills specifically, most high level learners tend to be Linux proficient as well. It’s just a corollary I’ve observed over many years. For his Demo Day project to show parents, one kid built Ia multi-player version of Space Invaders in Java using libgdx on his machine running Linux Ubuntu. He did this at age eleven.

Since our platform runs on Linux, and my core team also serves as mentors to kids in these programs, we always enable students to see behind the scenes, and learn why we use certain tools, such as Linux and Vim. To a kid at first this stuff is not obvious, or necessarily relevant. Still, as they advance we show them how Vim works and how it is used, then create challenges for them to try it out. Some kids are tuned to this innately and once they touch Linux and root dev methods via command line, they never go back to the GUI. Others fight it, as you might expect, and prefer GUI computing. Once kids self-identify as interested/proficient in an area like this one, I try to support them sharing their knowledge and interests as they think others should also like learn. Here is one such example, created by 12 year old student who only uses Linux machines.

DS: What is it that makes kidOYO’s approach so different?

DL: Our main goal, which is to influence the way learners—not students—approach learning. This affects both students and teachers, as we create opportunities several ways: one in the context of indie classrooms that our organization leads, another in K-12 classrooms our mentors support, and another in the context of professional development, or PD, allowing teachers to self-provision their needs as a peer-group and as a school district. Our platform serves as foundation for STEAM (science, technology, engineering, art, math) learning across a diverse landscape of tools, methods, toys and concepts, all of which are about self-led learning, which goes beyond pedagogy.

Its not without a fight. There are plenty of pedagogy-centric thinkers in the ranks of teachers and administrators, and plenty of methods to support their priorities. For example, pedagogy-oriented thinkers tend to complain when we give kids real world tools like their own DNS (domain name) + hosting + editors. Instead they prefer limiting use-context of tools, so the kids can’t publish online and complete the feedback loop. Instead of letting kids build their own learning portfolio, they prioritize templated tools with limited functions and limited downstream value for independent learning.

DS: Tell me about your approach to business and open source, because I think it’s unusual.

DL: We think open source needs to be free as in freedom, or liberty. Not that a business based on it has to be free of cost. So we’re all about free code, but we charge for our services. And that’s how we’re bootstrapped. Our services provide nearly all our income. Parents and schools pay us because they get value out of what we do. Even though kidOYO is a nonprofit, that’s our business. And paying for value is part of what makes us a community as well.

Meanwhile, we’re up against corporate giants who may or may not be about open source, but they do give away lots of stuff for free, either outright or on a freemium model. Smaller edtech operations with superficially similar businesses also have a FREE come-on.

DS: But some of your services are free, no?

DL: Yes. With codeLI.org and FredXcoders (in Fredricksburg), all content is created by volunteer mentors, and the depth of engagement tends to be exploratory in nature. These communities run cost free live events, and curriculum resources are produced to support independent engagement. Where that model struggles is with sustained mentor engagement. That’s why we have a member:cost model.

In order to have sustained mentor feedback loops, kidOYO runs member:cost communities. Mentors in these are compensated for their support of student learning. This allows increased diversity in learning progressions for students, and has served as the way we’ve generated revenue for our organization, and funded building all these tools and relationships over the years.

Here we have an opportunity for the platform to support “franchise” community groups. Each franchise group can set its own cost, manage community content, and structure mentor-student memberships locally based on permissions and rules set by community owners. As a non-profit, this becomes a business model capable of provisioning multiple services under the membership definition. IRS rules allow for membership fees of up to $75/year without any reporting requirements.

At kidOYO, we are also about to unveil our direct-to-customer model of memberships, which will be priced at $75 per year, plus the cost of curriculum pathways chosen by members. An OYOclass.com account is always cost-free and allows for data access/ownership/downloading, plus project portfolio and badge portfolio. Joining a community has rules set by the “community owner.” So codeLI.org is FREE, but kidOYO has costs.

Membership in kidOYO will come with some additional benefits, such as discounted member pricing for our live Fall/Spring/Summer events and our swag store. We are currently building a OYOclass API to support this across domains .

Schools are currently paying $35 per year per account at a minimum of 1250 accounts. This represents our charity pricing, with zero margin, as an unfunded non-profit supporting tech and live personnel services. We are lowering the minimum to 500 at $50 per right now, and hoping we can do even better as our efforts scale. Our original goal with zero accounts using the platform was to get under $100, and our new goal is to achieve $20 per account. As a combination of tech plus live personnel services, plus data privacy/fiduciary under legal contract, I think this is strong value proposition.

DS: Unpack your corporate structure a bit more for us.

DL: kidOYO is a registered trademark for educational software and first put in use in 2006. It functions as DBA for NoizIvy.org, a 501(c)(3) nonprofit, which was founded in July 2001 to empower “entrepreneurial learning with technology through creation of products, services and programs in our local communities driven by peer-to-peer exchange and empowerment.” In that role we founded FredXcoders in 2011, in Fredricksburg, Virginia. We also started CodeLI.org in 2012, LI Maker Fest in 2015 (now official Eastern LI Maker Faire), launched the OYOclass.com platform in 2014 and in our first school district in 2015. That was Mineola. Now we’re approaching 75,000 members on Long Island in New York NY and preparing for national and international launch.

I’m the President of the company. Also a Developer and a Lead Mentor since 2001. Melora is the Executive Director. Also Creative Developer and a Lead Mentor since 2001. Bo Feng is a Software Engineer and our CTO since 2014.

Nine people work at NoizIvy.org currently. The three I just named are the core team. We also have four software developer who are also mentors, plus one maker engineer/mentor and one office manager, administrative assistant and mentor. Everyone in the company learns to code if they don’t know how already.

We also have one print book, self-published in 2006, and available online for twenty dollars.

DS: Tell me more about your business. What do you charge? How do you set your price points?

DL: We actually start with the seventh amendment of the U.S. Constitution, which says “where the value in controversy shall exceed twenty dollars, the right of trial by jury shall be preserved.” We all know that amendment as the one giving us a right to trial by jury. But what it actually presents is a twenty dollar bar against triviality, and a distinction between civil rights and market rights. There’s space there, under that twenty dollars, or the modern equivalent, for you to present a small civil case in a courtroom. Within that space is an existential gain in empowerment. It’s also where we have also evaporated our civil rights. So we look at our market, and mark up from that twenty dollars to thirty five dollars. That’s what we charge per student. Again, it’s pretty simple. And the student gets a lot of value for that money. So does the school, if they’re paying.

Now let’s go back to free-as-in-beer, or gratis-type free. Giant companies are taking huge advantage of free, with services consumers get for free while other value is being extracted from them. That’s one issue. Another is the Trojan Horse of free corporate goods and services to schools. Another, for us, is relevant to Linux and open source. In many cases schools are barred at the union level from accepting free services, at least in New York State. But our service isn’t gratis, so we’re cool with that.

DS: Have you taken in payments from any big companies?

DL: We have taken in ten thousand dollars in funding over past years from two corporate entities. Red Hat gave us five thousand, and a Long Island manufacturer, North Atlantic Industries,which makes electronics components used in military industry applications. That is it. We do everything on our own, in relationship with the community we serve. Parents and education leaders drive kidOYO. And the kids too.

DS: Is there a name for your category?

DL: There are two. One is #IndieEDtech. That’s the hashtag. The other is creative computing, which comes from the MIT Media Lab‘s Lifelong Kindergarten and and their Scratch perspective on CS.

DS: Is there a list of what we might call competitors here? Or just projects and companies kind of in the same business, or with similar goals?

DL: The FreeCodeCamp model exists in abundance, but it’s not really competitive. While it’s a free thing anyone can access, they spend most of their online real estate begging for donations. I’m not really down with the “begging” business model. Ask or beg for twenty dollars and its the same outcome. When you do that, you get trounced on by behavioral manipulation outcomes that distort the meaning of free-as-in-liberty initiatives. Codeacademy, CoderDojo and Code.org all have a different business approaches, but all use FREE-as-in-beer as a tool. When you look at the downstream relationships among their funding sources, however, more meaning is revealed. There’s a big donor status quo behind them.

#indieEdtech and kidOYO stand apart in the creative computing camp. Nobody else has our advocacy for public-benefit technology and skills that individuals own root authority over.

For a long time, our messaging was unlike any other. Only recently has the messaging of Apple turned in our direction on the data privacy side, but still we’re still dealing with “corporate EDU” having different priorities than #indieEdTech. So we do compete, in different ways, with Apple, Google, Amazon, Microsoft and Salesforce, all of which have their own corporatized approaches to teaching kids computing. We compete with all of them as a small, or even tiny, non-profit. And we do go up against these behemoths every year, in every school, in every program offering we produce.

Because we’re independent, we can be very competitive. We are beating the big corps in direct contract negotiations. Their FREE approach versus our cost-with-integrity one go head to head, and we often win.

And, to be clear, we’re not always against what they bring. We’re glad to make use of the iPads and Chromebooks the schools get from big donors. But what we’re about is utterly independent of every behemoth’s businesses and goals for themselves.

DS: How about foundation or government funding?

DL: Something like 1.3 Billion got pledged pledged in support of CS education in 2018, and I expect kidOYO would be rejected by all of these sources of funding. I say that based on experience. We always get shot down. That’s because the goals, whether explicit or veiled, are for corporate expansion of influence and methods. Not for fostering general adeptness of independent learners. So we no longer bother.

DS: Tell me more about how you things are spreading geographically on Long Island, and how that models growth, both geographically and in other ways.

DL: You ask at a good time. We had a meeting with ten school districts yesterday. It involved superintendents, assistant superintendents of curriculum and instruction, directors of technologies, and lead teachers that focus on integrating kidOYO “code, make, own” in various parts of the middle school experience: grades five to eight.

The context was the first inter-district competitive hackathon on Long Island. Ten teams of eight students and coaches will be come together to engage in a social competition framed around “creative computing.” CS, engineering, making, art, game development, math, English, science and data-data-data will all be in the room during this event. And yet, “creative computing” among those peers will own the spotlight.

This is significant because in this area of learning—talent development and discovery, social creativity with imagination and empathy—stands out starkly against the historical basis of “computer science,” which is defined by its focus on research.

In so many ways, creative computing is enabling a different perspective on technology. Old school CS people do not present a developmental pathway I am excited about advocating to my kid, let alone yours. I find that too many people on this path put technology in front of humanity far too regularly. Can’t drive? Build self-driving cars. Can’t cook? Build UberEats. Can’t generate human empathy and personal imagination? Access sci-fi novels and support dystopian narratives of diversity in culture-producing outcomes.

On the other hand, creative computing is a paradigm shift away from that. It opens up a new context for the human-technology relationship: one that pursues framing on the basis of functional outcomes.

As we prep for this kind of event, we talk and discuss the creative computing perspective with education leaders that rarely get exposed to it otherwise. Every opportunity to bring this perspective to institutional players is a powerful step forward.

DS: Tell me more about your pitch. It must be interesting, because what you do is so radically different. And your website isn’t clear enough to me.

DL: Actually, what I have on site is misleading, by intent. Ive been throwing off competitors from swiping our concepts for years. I only have specific things on our sites that we can point at and clarify in personal conversations.

kidOYO.com only processes registrations for live events, which funded our operations for many years. OYOclass.com only sells a narrow description of what our platform can do, and sales only happen face-to-face. Our community of members and active users have a very unique and personal experience within our tools and with our community experience.

In essence, kidOYO is a creative computing and entrepreneurial learning management system. It is comprised of distinct components allowing it to have a multitude of market-based utilities and relationships. These include

Personal learning tools, curriculum, mentor-feedback loops, and data-management capabilities over portfolios, micro-credentials, DNS, and web hosting configurations used by young learners engaging in “coding, making, owning” subject matter. Community infrastructure for entrepreneurial leaders bringing FREE & Cost-based service relationships to their business or charity. School infrastructure used for CS/Engineering/Entrepreneurship instruction and Professional Development, as well as for portfolios, micro-credentialing systems (which replaces report cards in some cases) and challenge-based learning. University infrastructure for Creative Computing/CS/Engineering/Entrepreneurship, providing every student root creative tools and data control across diverse domains of study, including a micro-credentialing system, portfolios, domains + hosting, and more. Mentoring system, for direct feedback loops useful in community education with focus on creative computing/CS/Engineering. Micro-credentialing platform, supporting open badges used in K-12, University, and community programs. Portfolio system, for use by students/teachers in documenting their own creative learning work for University admissions or workforce outcomes. Business model infrastructure—our most aggressive push, to change how community-based groups, businesses, schools and universities consider their evolving relationship with “customers” in a world where all participants own root value.

As you see, our biggest challenge has always been describing what we do. We do a lot, and it’s hard to say it briefly.

I am also fond of pointing out as a 501(c)(3) leader and developer of entrepreneurial outcomes, charity and entrepreneurship start the same way: when someone volunteers to solve a problem. We’re trying the build the creative infrastructure that supports those people, on a foundation that respects individuals as the root of community value.

DS: I’d like to know more about kidOYO’s background. I know you and Melora are from Long Island, and you’re here now; but you got your start with a toy business in Virginia. Tell me how you got from here to there to here again.

ML: I grew up in what many would consider an unconventional household.  My Dad was a unique mix of tinkerer, artist, dare-devil and outright crazy person. I am not sure he was prepared to have a girl to be honest. That produced…well…me.

It’s relevant to point that out first, because his unconventional approach to parenting created someone who was never raised with gender “limitations..  I spent a good deal of my childhood immersed in projects in his shop out back. Although he was more than capable, he rarely created for the sake of practicality so there was never any fear of failure mixed into anything he or I did. The freedom to fail and laugh at those failures helped to shape me into someone who jumped into a any project I found interesting without worrying about the end outcome.

I had a tremendous amount of freedom, not being defined as a girl or a little kid in my environment. I could use any tool, any material, any method and make for the sake of making. I was trusted to fail through my own process of exploration when I was with my Dad. I was able to learn by doing and playing and that is how I approach my life still. I love to create new ways to learn through play.  I studied Psychology at Stony Brook University, and when I took my first cognitive psychology class I was hooked. Really, everything I have done since then has been connected to the idea of teaching new skills to young brains.

I also learned along the way that I am a visual hands on learner. I think in pictures, so having the power to create digitally and with code has been so empowering for me. It gives me an unique ability to convey complex concepts to kids because I automatically turn problems/solutions into pictures.

DL: So I went to high school on Long Island. Melora and I met there too, though we didn’t get together for good until much later.

In high school, my aim for college was to play baseball and basketball. After a very successful time at that in high school, I hoped to get a scholarship at Columbia, but I got injured. I recovered well enough to get a scholarship to George Mason University, but there my interests became more academic and entrepreneurial, especially around technology.

That interest began with my first programming experience, which was on a Commodore 64 and soon after on an Apple IIe, when I was around nine and ten. Neither of my parents had any technical proclivities, but they put those things in front of me and I took advantage of them.

In high school, I discovered the magazine 2600 and how to hack into phone networks to make free calls using low-tech methods like a whistle to simulate tones unlocking long-distance calling capabilities. In terms of programming, I started in a Windows environment before Linux came around and I just had my own local environment to play with. I also ordered a CD-ROM on programming in C around that same time and played with that.

It wasn’t until 1992, when I entered college in Fairfax, Virginia that I started falling in love with programming on the Internet. Looking back, I was hacking around with ASP, PHP, HTML, databases… never learning in a formal setting, since none of this was covered in the CS classes I enrolled in. I was just playing, and trying to find resources that could help me better understand the dependencies and histories of languages, their ideal uses, and so on.

I learned everything by trial and error, finding others I could ask questions, and playing around with ideas. That was my learning method.

As the Web expanded, my learning expanded right along with it. I count myself lucky to have come of age at a time when the Web was doing the same, and people were making learning resources available online. As a lifelong entrepreneur, I am convinced that learning to code has been essential for me in learning to create in business. As an athlete, I self-taught my advanced skills, but never contemplated the role of failing forward and failing fast in skill development. I simply loved playing, and that was my focus.

As an entrepreneur and programmer, failing was not fun or playful. I had to discover something about my own attitude towards skill development that was crucial in sports and life in order to succeed in both. I never realized that it took personal courage and a productive attitude to overcome failure. I never realized that I had positive expectations sustaining my efforts through failures. I never counted how many shots I missed, or errant balls I threw. I just played with endless enthusiasm.

Becoming mindful of the process of learning to succeed by failing forward changed everything for me. It affected the speed I embrace in creative work: to start, fail, change methods, fail, iterate, fail, shift focus, fail, and keep moving forward with eyes on the objective, which is that I remain interested and enthusiastic about pursuing what I learn. None of this required me to be naturally gifted; it only mattered how I did what I did to improve.

So that is the root message of my own learning that I bring to all our programs: “Fail Forward, Fail Fast.

My non-profit interest traces back to my upbringing in a home that heavily leans left, with lots of social workers, economists, and university degrees in the ranks of my relatives. One of my first jobs was working for a local homeless shelter on Long Island delivering food to their shelters. I met women with Ph.Ds who were homeless, and who changed my perspective on what real poverty was. That influenced much of what I do. I am a big believer in empowering people, and helping them overcome the structural deficiencies that separate the haves from the have nots. At the root of my efforts is a constant interest in understanding entrepreneurial creativity, economics, and personal skill development—and helping people advance all of those.

I founded our non-profit in Virginia after attending the university, and then staying around Fairfax during the time when the Internet and the Web were entering the public sphere of interest and influence. My energies, throughout my entire adult life, have been entrepreneurial in nature, and the non-profit allowed me to bring technology skills and subjects into contact with kids and families in a powerful way. I simply volunteered to teach, to share ideas. And over time that work grew based on the interaction that those people were having with my ideas and methods.

OYO—own your own—has been the guiding philosophy behind our efforts. For young people and families, it resonates in a simple and powerful way. Kids get it immediately. Self-sovereign empowerment, personal data control, and skill development are all baked into of our methods and tools. From the beginning, our kids have always had control of their learning outcomes, their data, their methods, their paths of study, their voices, and their ability to make mistakes: to fail forward. Its all baked in. Kids follow what I talk about. When I visited the United Nations to discuss the role of identity in Human Rights, they knew why I was there. I went into schools and built projects with kids using Scratch and Python languages, explaining the the UN’s Universal Declaration of Human Rights and discussing how children are affected by such simple things as having a self-sovereign identity they truly own, and how administrative identifiers given to them is not the same thing.

The toy business grew alongside this. With Melora’s background in psychology, and our own son creating needs that drove solutions, sensory play products, construction play products and technology play were front and center every day. As a Dad, I tested my own ideas about the earliest ages different methods could develop an entrepreneurial mindset about skill development and participation in Society for kids as creative human beings. At age three, our son was delivering eggs from his backyard chicken farm to neighbors, setting prices, tracking expenses on crayon drawn charts, and learning that root authority in this life is personally created, as are the products and services that allow people to make money. Not jobs. That’s a concept that shows up much later. Only now, at age thirteen, and about to turn fourteen (the working age) has this become a subject of inquiry. On a foundation he owns, all possibilities exist for him to explore.

DS: We’ve talked about the effect of your work on kids, and Dr. Nagler talked about those kids’ effect on the world. He also talked a bit about the challenges his schools would face adapting to your methods, and how they tend to rocket kids upward in expertise faster than the existing system can handle. Now I’m wondering where you see this going for formal education?

DL: I see education splitting into two philosophical pathways.

One is old-school CS/E, with roots in research pedagogy. This has all the hallmarks of a lack of gender/race diversity and tends to attract specific types of students, led by traditional types of teachers. Most CS programs cannot support demand from students on University campuses today, and in K-12, the focus on AP CS testing requirements, as context of learning, remains broken. For example, where no computer is needed for final test. It’s all paper and pencil responses. But this is the default system, and it will persist, because it’s deeply institutional.

The other is creative computing in CS/E. MIT leads the way here, with forty percent of their undergrad population studying CS, according to a recent New York Times article. Harvard CS 101 also supports this path by starting students off with Scratch language. It also reports higher engagement by girls/minorities. What matters, as far as we’re concerned, is that creative computing works. It vastly outperforms old-school CS. And at some point, the market will move in the creative direction, because the demand will be there. And many of the kids in our communities today will supply that demand.

DS: What are the larger implications of that, outside the academy?

DL: Mainly ramifications for the future workforce. It will apply everywhere because CS applies everywhere. We live in a digital world now. You’ll see it in physics, applied math, arts, humanities, and social sciences. Anywhere the creative use of data and computing methods will make a difference.

And one of those differences will be empathy for local human conditions. This is sometimes lacking, radically, on the traditional path. We see this in all the ethical arguments against the abuses of people by many typical VC-funded Silicon Valley companies.

DS: We didn’t talk much about what you’re doing with other institutions outside Mineola. Can you give me a rundown of what’s happening in Stony Brook and other places? I want to get a sense of how your fires start and spread.

DL: So far, we are working with less than fifty percent of the hundred and thirty school districts on Long Island. There is plenty of potential, but many of the remaining districts here are not in a position to work with us, either due to their own budgetary planning and pension funding requirements, or due to lack of computing resources, such as laptops and computers for kids and teachers, or even functioning Wi-Fi. Those are real obstacles to progress.

The biggest obstacle is admin-level and teacher familiarity with this type of literacy. You can imagine the looks of confusion we encountered describing what we were doing in 2012. It’s getting better in 2019, but there is still lots of room for improvement. Setting accurate expectations is another challenge. Too often school leaders do not believe this type of learning is accessible to five, six, and seven-year-old kids, in a practical/meaningful way. So they stop waaayyy short of what kidOYO offers, choosing much lower level “digital literacy” pedagogies that make sense to a Facebook/Instagram user, with almost no context of functional literacy. In these cases, parents route around the problem of low-performing schools and bring their kids directly to our programs in large numbers. That can influence schools as well. We don’t need to push. The parents and kids do that.

At Stony Brook, we are actively discussing our own capability to support an “OYO University” self-led model of learning with mentor support on campuses that are struggling to meet the demand of students for this kind of learning. Stony Brook has been a great partner of ours for many years, with around six hundred fifty students enrolled in their Computer Science degree program. The demand from non-CS students to take CS classes is so high that they had to shut out students from enrolling in CS classes. And this is by no means unique to SBU. It’s happening all over the country and the world.

We are about to release OYO University platform deployments broadly as well. We just pitched SBU on our ability to support their 25,000 undergrad students starting in Fall 2019, and we’re looking for a first University partner to provide CS/E/Coding to all students on their campus.

Toward that, to date we’ve donated $125,000 to the SBU Foundation in support of a “Mentor Service Award.” That’s a lot for an unfunded non-profit. And this could scale with help.

This is just a fraction of what we talked about since our visit to see kidOYO at work. And I have to say that I want to be cautious about getting too enthusiastic. Reporters are supposed to be as objective as possible, and not to get emotionally invested in what they cover. But without emotional investment Linux wouldn’t be here, and neither would Linux Journal.

Marshall McLuhan taught that our tools are extensions of our selves, and that they shape us after we shape them. He also said every new medium “works us over completely.” That’s what’s happening in our new digital age, which is still very new.

Not long ago I was talking with Joi Ito, an old friend who runs the MIT Media Lab, about historical precedents for what we might call our species’ digital transition: the one by which we become digital as well as physical animals. Was it as big as the industrial revolution? Movable type? Writing? Speech? Joi said, “I think it’s the biggest thing since oxygenation.” In case you’re counting, that happened about two and a half billion years ago. It’s a big deal.

There’s a dystopian way of looking at what becoming digital does to kids. There’s also a utopian one. Both are polar extremes that surely won’t happen. But after watching these kids in Mineola learn, and help each other, and invent stuff both for fun and because it’s useful, and become so adept, so fast, at creative computing… well, color me optimistic. With these kids, and others like them, the world is on good hands.


The Pragmatic Engineer

The Pulse #113: Engineering culture at Google vs Meta

Also: AI now generates 25% of code at Google; Deep cuts at profitable companies like Dropbox and Miro; Business booming at the likes of Google, Meta and Microsoft, and more.

The Pulse is a series covering insights, patterns, and trends within Big Tech and startups. Notice an interesting event or trend? Send me a message.

Today, we cover:

Industry pulse. AI creates 25% of new code at Google, GitHub Copilot ditches ChatGPT exclusivity, Arc browser struggles for mainstream attention, Microsoft engineer shrinks a huge Git repo by 94%, Mozilla could become an ads company, and more.

Engineering culture at Google vs Meta. Software engineer Roman Kirillov joined Google after 14 years and shares impressions on how the two companies differ, and how they are similar. A “monolithic” engineering culture at Meta, talking about moving fast — and then moving really fast and outages being more “everyday events” than at Google.

Deep job cuts at profitable companies: Dropbox, Miro. Both companies are growing and make a profit, but still did cuts around 20%. A reminder that laying off while being profitable is a real downside of going public or taking VC-funding.

Big Tech: business is booming; what recession? The largest tech companies are doing very well-busiess-wise, recent result show. They all seem to hire at a more conservative pace than how their revenue is growing, though.

1. Industry pulse AI creates 25% of new code at Google

It has barely been two years since ChatGPT was released and took the world by storm. Google responded with Bard – now renamed Gemini – and today, the majority of software engineers use some kind of AI coding assistant.

Google has built and rolled out internal, custom GenAI tools, and uses these actively. This week, CEO Sundar Pichai shared during the company’s earnings call that “more than a quarter of all new code at Google is generated by AI, and then reviewed and accepted by engineers. This helps our engineers do more and move faster.”

Most focus is on the “25%” figure, but I’m more interested in “reviewed and accepted by engineers.” Google is known to use very heavy testing automation and modern shipping approaches like canarying, feature flags, experimentation, etc. Despite this, no GenAI code ships to production without human review! I think they’re right to not trust GenAI by itself.

Google is invested in making GenAI a success, and Gemini is a growing part of their business. So it’s safe to assume that Google is further ahead than its competitors in integrating Gemini (and GenAI) into its engineering pipeline. Google has been a trailblazer at innovating large-scale software engineering approaches, and GenAI code generation will be the norm across Big Tech. I suspect it already is.

GitHub Copilot ditches ChatGPT exclusivity

Read more


IdM Laboratory

IIW 39 Day2クィックレビュー

こんにちは、富士榮です。 ということでInternet Identity Workshop(IIW)も2日目です。 今日は朝からワーキンググループコールも入っていたので早めの時間から会場でZoomコールをしていたのでより疲れました。 ということで、本日のレシピは、 Delegation + Impersonation for Agents on behalf of human Wallet + Key Attestation Zero Knowledge Proof for mdoc Originator Profile JSON-LD VC with BBS Signature の5本です。 ということで観ていきます。 Delegation + Impersonation for Agents on behalf of human…
こんにちは、富士榮です。
ということでInternet Identity Workshop(IIW)も2日目です。
今日は朝からワーキンググループコールも入っていたので早めの時間から会場でZoomコールをしていたのでより疲れました。
ということで、本日のレシピは、 Delegation + Impersonation for Agents on behalf of human Wallet + Key Attestation Zero Knowledge Proof for mdoc Originator Profile JSON-LD VC with BBS Signature の5本です。
ということで観ていきます。 Delegation + Impersonation for Agents on behalf of human… OIDC, OAuth まずひとつ目です。最近よくきくエージェントに自分の代わりに何かをさせましょう、って話です。

まぁ、色々と話はしましたが結果的にOAuthのモデルに当てはめるとどう考えられるのか?という話だったのでそれほど目新しさはなかったですね。
当てはめとしては、 Resource owner : End user Client : Agent Relying party : API とおいているので、まぁそうでしょうねぇ。
Agentかどうかは置いておいて普通のOAuthですね。Token Exchangeつかえそうだね、みたいな話もありましたが。
Wallet + Key Attestations - Paul, Christian, Kristina ドイツのWalletの検討の話です。かなりノウハウが溜まってきていますね。
検討の対象は以下の2つのアテステーションです。 Wallet/Client Attestation:ウォレットが正しいかどうかを示す Key Attestation : キーストレージの状態とユーザの認証状態を示す EUでは政府認定の民間ウォレットに対して国がクレデンシャルを発行する、というモデルを取るため、野良ウォレットでないこと、ユーザが認証されていること、秘密鍵がちゃんと管理されていることを示さないとクレデンシャルの発行など怖くてできないわけです。
チェックのタイミングについても色々と考えていて、 Issuerがクレデンシャル発行をする際:WalletもKeyも必須 ウォレットからVerifierへクレデンシャルを提示する際:オプション(Issuerが発行時に確認しているから推移的に確認できる、という判断もあり)

アテステーション自体はWallet Providerが発行します。 ここの発行・管理プロトコルは標準化されていませんが、いずれ標準になってくるのかもしれません。(まぁウォレットベンダーがそれぞれやってよ、でもいいんだと思いますが)
この仕組みがあることで、以下のようなシナリオに対応ができるようになります。 スマホの機種変更 複数端末の利用の管理 端末を盗まれた、無くした ウォレットに脆弱性が見つかった どう言うことかと言うと、Issuerはクレデンシャルを発行する時に発行先のWalletの情報(アテステーション)を管理しているので、ウォレットプロバイダがウォレットアテステーションをRevokeするのをトリガーにIssuerは発行済みのクレデンシャルをRevokeする、という使い方ができます。こうすることで機種変更時や盗難時などに以前の端末に入っていたクレデンシャルを一括で無効化できるので安心、というわけです。 属性証明と違って本人確認書類とし利用する身分証明書となるとやはり発行管理が必要になるので、日本のように民間のウォレットがマイナンバーカードに依拠したクレデンシャル(いわばマイナンバーカードのコピー)を身分証明書として利用できるなんて変なことは起きないわけですね。
ちなみにVerifierに提示する際にWalletアテステーションを提示するかどうか、って議論もありましたが個人的にはLinkabilityが上がっちゃうのでやめたほうがいいんじゃないかな?って思います。やっぱりIssuer側でちゃんと管理って世界なのかと。
Zero Knowledge Proof for mdoc - Google 次はGoogleの方からmdocに関するZKPの実装の話です。
先ほどのWalletアテステーションのセッションのところにも書きましたが、mdocでもSD-JWTでもプロトコルの一部としてリンク可能性を高めてしまう情報が埋め込まれてしまうことがあります。 これをなんとかできないか?って話ですね。

そうするとデバイスとのバインドを示す鍵の置き場所はSEに限られてしまう、と。 この鍵はPresentation時に使われるので、BBS+などIssue時にデバイスバインドされた鍵の変更を要求する仕組みを使うのは非常に難しいってことになってしまいます。何しろ一番下のレイヤーの変更をしなきゃいけないって話になるので。
mdocやSD-JWTで選択的情報開示をすることでデータ見にマイゼーションの問題が解決できたとしても、リンク可能性の問題が残っちゃうよね、って話は前からありましたが、いよいよそこに手をつけ始めようとしている感じですね。


Googleでは内部ロジックの高速化などを図り、BBS+など従来の”スマートな”方法ではない方法(Hyrax)を模索していく、ということです。

Originator Profiles - Shigeya Suzuki 鈴木先生によるオリジネータープロファイルの話です。 何気に中身を詳しく聞いたことはなかったので非常に興味深かったです。 https://originator-profile.org/ja-JP/
コンテンツの発行元とコンテンツの内容の真正性の両方をちゃんと検証できるようにしましょう、って話に加えて認められた場所(アグリゲーターなど)でその情報が発信されているかどうかを確認できるようにしましょう、という仕組みです。
現状はブラウザにエクステンションを入れてチェックするとブラウザの中で表示されているコンテンツ(ニュースなど)がどのメディアによって発行されたものなのか、そのメディアはどう言うプロファイルなのかなどが確認できるのと、ちゃんと許可されたサイトでコンテンツが表示されているか確認できます。
偽情報・誤情報を利用者自身で確認できるようになるのはいいですし、広告主が意図しないサイトに広告が掲載されてしまうことが防げるようになるとブランドイメージの保護などにも役立ちそうです。
今後が非常に楽しみです。

JSON-LD VC with BBS - Dan Yamamoto 最後はIIJの山本さんのBBSの話です。

BBSの部分は前回まででほぼ完成しているので今回のポイントはやはりリンク可能性です。今日はこのテーマで1日終わった感じです。やはり熱い領域です。
山本さんのアプローチはPseudonym did:keyを使うということです。 これはひとつの秘密鍵に対応する複数の公開鍵を作成できる技術をうまく使ってIssue時、Verify時にSubject Identifierとして使う署名検証鍵を含む識別子(did:key)の出汁わけができる、と言うことです。
ドメイン単位でこれを使うことでInner domain linkabilityとinter domain linkabilityの両方を実現できるわけですね。


まだ標準化へ持ち込めているわけではないそうですが、今後の標準化が望まれますね。

ということで明日は最終日です。

Wednesday, 30. October 2024

IdM Laboratory

IIW 39 Day1クィックレビュー

こんにちは、富士榮です。 今年もInternet Identity Workshop(秋)が開催されています。 場所はお馴染みComputer History Museum@マウンテンビューです。 世界中からIdentity Geek達が集まってきて生煮えの技術について話し合います。 (まぁ、SIDI Hubを含めいつもの面々ですが) ということでDay1について見ていきましょう。 Progressive Trusted Registry - Dmtri, etc お馴染みのDmtri達の新しい論文をベースにしたセッションでした。 https://docs.google.com/document/d/1sZ7za4gwyKopd576eAWUGHZ-Q_vWd8aqdm7v-X1INEk 外だったので若干寒かったです。。 テーマは、ど
こんにちは、富士榮です。
今年もInternet Identity Workshop(秋)が開催されています。 場所はお馴染みComputer History Museum@マウンテンビューです。
世界中からIdentity Geek達が集まってきて生煮えの技術について話し合います。 (まぁ、SIDI Hubを含めいつもの面々ですが)
ということでDay1について見ていきましょう。
Progressive Trusted Registry - Dmtri, etc お馴染みのDmtri達の新しい論文をベースにしたセッションでした。 https://docs.google.com/document/d/1sZ7za4gwyKopd576eAWUGHZ-Q_vWd8aqdm7v-X1INEk
外だったので若干寒かったです。。
テーマは、どうやってTrusted Registryを構成するか? KYC後、インクリメンタル・プログレッシブにトラストを構成していくためにLinked Claimsを使っていくアプローチについてのディスカッションです。 レピュテーション、他のトラストフレームワークによって管理されているEntityから発行されたVerifiableなアテステーションを用いる、などなど。 レピュテーションを買えてしまう時代なので、ネガティブClaimをうまく使ってノイズをフィルタリングをしていく、などのアプローチについて議論が行われました。 グラフ理論ですな。昔のConnect.meとかrespect networkの話を思い出しました。 レピュテーションのスコアリングの話も出ましたが、まぁ、結局はVerifierがクレデンシャルのIssuerを信じる度合いは、VerifierとIssuerが同じコンテキストにいるかどうか、というあたりも大きな要素ですよね、って話もあり、モデリングは非常に難しいよね、って話をしたりしました。共感認知の話ですな。
しかしこうやって見ていくと、みんなセントラル・レジストリを作りたがるんですよね。。 ダイナミックにGraphを構成したりDiscoveryしたりするための分散データベースのようなアーキテクチャの方がLinked Dataには合っている気がするんですが。 結局、やりたいことってSemantic Webですよね。。

Digital Fiduciary Initiative - Joe Andrew タイトルの通り受託人(医者とか会計士みたいにクライアントの代わりにプロフェッショナルとしてサービスを提供してくれる人たち)のデジタル版がどうやって信頼の形成に役に立つのか、という話です。
文明が生まれ部族社会が出来上がり社会の中と外の境界が生まれた。すごいところから始まりますね。
機械化、IT化が進み人類の果たす役割は減っていく。まさに限界費用逓減。 そんな中、人々が社会参加していくには信頼できる情報をAssertしていくことが必要な一方で監視社会が進んでいく。 一方でビットコインはTrusted Authorityなしで(衆人環視の元)信頼できるトランザクションを実現することとなった(マネーロンダリングの温床となるなど問題も同時に産むこととなったが)。DIDはそのアイデアを識別子の世界に持ち込むことで一定のイノベーションを産んできたわけです。Login with FacebookがFacebookを信頼する必要があることに対してDIDは暗号アルゴリズムに対する信頼を行う、という点が異なる。分散型アイデンティティを実現する上で、VCをどうやって信頼するか?という問題が残っている。
Digital Fiduciary Initiativeのアプローチ
Fiduciary(受託者)、例えば医者や弁護者や会計士など人々に代わって特別な領域を扱う人々を指す。 デジタル受託者はアイデンティティを扱うのを助ける。Digital Fiduciary Initiativeは誰が受託者としてアイデンティティを扱うに資するためのResolution processを助けるプロトコルを提供している。

なるほど、受託したFiduciaryサービスが利用者にVerifiable Credentialsでクレデンシャル(Fair Witness Credential)を発行する、と。医者がカルテをVCで患者に提供したり会計士が会計監査した証明書をVCで発行したりするのに近いのかな?
情報銀行+公証人ってイメージかも。
こちらで活動しているから興味がある人はどうぞ、とのこと。 https://digitalfiduciary.org/

CBOR and Gordian Envelope - Christopher Allen 次はChristopher AllenのCBORとGordian Envelopeの話です。 暗号学者の話&資料なしのガチ議論セッションなので正直ついていけないところも多いのですがざっくりと。 ちなみに彼のBlogに色々と書いてあるので気になる人はそちらを見ましょう。 https://www.blockchaincommons.com/introduction/Envelope-Intro/


CBORの話は、DID Controller Document(いわゆるDID Document)をdCBORで作ることで鍵ローテーションや署名の追加などへの対応が軽くなるよね、って話でした。 現在のDID Controller Documentって鍵の追加・ローテーションを行うとどんどんサイズが大きくなるわけです。公開鍵の情報がどんどん追加されていくので。 なのでJSONベースでやるんじゃなく、そこをバイナリベースのCBORでやることでサイズを減らせる&分散台帳上でのブロック生成〜伝搬にかかるコストも下げられるし、IoTシナリオでは如実に通信コストが下げられるよね、って話です。
また、Float計算などコンピューターの系によって計算結果が若干異なってしまうことによるValueが決定的に扱えないことに対する解としてDeterministic CBOR(dCBOR)を使うのが良い(理由は後から)ってことです。
dCBORの仕様はこちら https://datatracker.ietf.org/doc/draft-mcnally-deterministic-cbor/
で、ここでGordian Envelopeの話ですが、ざっくり理解だとEnvelopを構成する要素のSubject、Object、PredicateでMerkleツリーを作っていきましょう、という話っぽい。SubjectもObjectもPredicateもそれぞれが下位の構造を持つのでそれを含めて。
そしてその時の各要素に関するハッシュをdCBORで決定論的に値を作れば綺麗にTreeはできるし、計算量もさらに減らせる、みたいな話かな。と。(よくわかっていない。読まねば)
Gordian Envelopeの仕様はこちら https://datatracker.ietf.org/doc/draft-mcnally-envelope/
選択的情報開示などにも対応できるし、TreeをGraph構造としてみなした時のノード間のエッジに双方向の意味付けをすることでマルチシグやグループ署名にも対応できる、など色々とありそうですがもうちょい勉強します。
DCQL - Daniel Fett 次はOpenID for Verifiable Presentations向けの新しいクエリ言語「DCQL - Digital Credentials Query Language。ダックル」の話です。 みなさんご存知の通り現状のOID4VPはPresentation Exchangeを使っているわけですが、こちらが複雑すぎるので新しくクエリ言語を作ってしまえ、ということのようです。

ちなみにDanielのTシャツはDCQL。イラストもスライドの写真もダックスフンドです。
DCQLは次のOpenID for Verifiable PresentationsのImplementer's draftに盛り込まれる予定で、 OID4VPにおけるPresentation Exchangeを置き換えるかもしれない クレデンシャル提示を求める際のJSONベースのシンタックス クレデンシャルフォーマットを問わない(sd-jwt-vcでもmdocでもOK) という特徴を持ちます。
開発に至るモチベーションとしては、 - PEがややこしい - ブラウザAPIで使えるようにする必要がある ということのようです。
具体的にPEのややこしさとしてあげられたのは、 - JSONPathが必要 - JSON Schema Filtersが必要 でした。
まぁ、PEは機能が豊富だが、OID4VPで全ての機能が必要だったわけではないということですね。この辺りはドイツのFUNKEのコンペの中で色々と学びがあったそうです。
余談ですが、スピーカーのDaniel FettはOpenID Connect for Identity Assuranceの仕様策定時にもAdvanced Syntax and Claims(ASC)というクエリ言語を作成して提案していましたので、かなりの部分で似た仕様になっています。そのうち統合するの?って聞いてみましたが、ありかもな〜という雰囲気。
使い方としてはAuthorization Requestにdqcl={....}という形でクエリパラメータとしてくっつける形を想定しています。 例えば、sd-jwt-vcの場合はこんな感じの書き方になるようです。 { "credentials": [ { "id": "my_credentoa;s". "format": "vc+sd-jwt", "meta": { "vct_valies": ["httos;//xxxxx"] }, "claims": [ { "path": ["last_name"]}, { "path": ["first_name"]}, { "path": ["address", "street_address"]}, ] } ] }
なお、mdocの場合はpathがないのでその代わりにネームスペースを使います。
クエリ言語としての機能としては、 - Claimのorが取れる - クレデンシャルのorが取れる - Claimのバリューマッチングができる というところのようです。
例えば、xとA1、A2、yというクレームのセット、もしくはxとBとyでも良いよ、という書き方はこのようになります。 "claim_sets": [ ["x", "A1", "A2", "y" ], →この組み合わせか ["x", "B", "y"] →この組み合わせでもOK ]
同じようにクレデンシャルを複数要求してみてその中のどれかを提示すれば良いよ、というクレデンシャルのorを取るならこんな感じです。 "credential": [ { "id": "pid", "format": "vc+sd^jwt" }, "credential_sets": [ { "purpose": "Identification", "options": [ [ "pid" ], [ "other", "other2"] ], "purpose": "show your xxx", "required": false, "options": [ [ "nice_to_have" ] ] } ] ]
optionsでidでorを書けるのに加えてrequiredでオプションかどうかについてもかけるわけですね。
そして最後に値によるマッチングです。 "claims": [ { "path": ["last_name"], "values": ["Doe"] }, { "path": ["postal_code"], "values": ["90210","90211"] } ]
こちらはわかりやすいですね。
なお、多言語化など未実装の事柄はたくさんあるそうなのでぜひFeedbackを、という話ですが、あんまりFeedbackしすぎで多機能化するとPEと同じものになってしまいそうなのでほどほどにしておくのが良さそうです。

Edge Identifiers - Christopher Allen 最後は再びChristopher Allenです。 先のセッションでも実は触れられていたのですが、このセッションはGraphを作成する時のEdgeを双方向にしてIdentifierをつけていく、というEdge Identifierについてです。
Edgeを双方向にして意味を持たせるってことみたいですが、正直理解が追いついていないのでこちらもおいおい勉強です。
親子という関係があった時に、親ノードA、子ノードBの間には以下のEdgeが出来上がります。 - AはBの親 - BはAの子 こう言う形で単にEdgeといっても方向性があり、それを双方向で表現することが肝のようでした。
こちらも彼のBlogに書かれています。 https://www.blockchaincommons.com/musings/musings-cliques-1/


ということで初日は終わりです。 相変わらず日本時間とのダブル生活になるので西海岸は辛いですね。来週はIETFでアイルランドなので少しはマシかもしれません。

Mike Jones: self-issued

OpenID Presentations at October 2024 OpenID Workshop and IIW plus New Specifications

I gave the following presentation on work in the OpenID Connect working group at the Monday, October 28, 2024 OpenID Workshop at Microsoft: OpenID Connect Working Group Update (PowerPoint) (PDF) I also gave this invited “101” session presentation at the Internet Identity Workshop (IIW) on Tuesday, October 29, 2024: Introduction to OpenID Connect (PowerPoint) (PDF) […]

I gave the following presentation on work in the OpenID Connect working group at the Monday, October 28, 2024 OpenID Workshop at Microsoft:

OpenID Connect Working Group Update (PowerPoint) (PDF)

I also gave this invited “101” session presentation at the Internet Identity Workshop (IIW) on Tuesday, October 29, 2024:

Introduction to OpenID Connect (PowerPoint) (PDF)

There’s more happening in the OpenID Connect working group than at any other time since we started the OpenID Connect work. In fact, two new specifications were adopted today!

OpenID Federation Wallet Architectures 1.0 OpenID Connect Relying Party Metadata Choices 1.0

Thanks to all who helped us get there!

Tuesday, 29. October 2024

Just a Theory

PGConf & Extension Ecosystem Summit EU 2024

Notes and links from the Extension Ecosystem Summit EU 2024 and my first time at PGConf EU. Plus thoughts and photos from ancient sites and archeological museums I visited.

Last week I MCed the first Extension Ecosystem Summit EU and attended my first at PGConf EU in Athens, Greece. Despite my former career as an archaeologist — with a focus on Mediterranean cultures, no less! — this was my first visit to Greece. My favorite moment was the evening after the Summit, when I cut out of a networking shindig to walk to Pláka and then circumnavigate the Acropolis. I mean just look at this place!

The Acropolis of Athens on the evening of October 22, 2024. © 2024 David E. Wheeler

Highlight of the trip for sure. But the Summit and conference were terrific, as well.

Extension Ecosystem Summit

Floor Drees kindly organized The Extension Ecosystem Summit EU, the follow-up to the PGConf.dev original. While the Vancouver Summit focused on developers, we tailored this iteration to users. I started the gathering with a condensed version of my POSETTE talk, “State of the Postgres Extension Ecosystem”, but updated with a Trunk OCI Distribution demo. Links:

PDF PDF with Notes Keynote

We then moved into a lightning round of 10 minute introductions to a variety of extensions:

Jimmy Angelakos showed off pg_statviz, a “minimalist extension and utility pair for time series analysis and visualization of PostgreSQL internal statistics”. Adam Hendel gave a brief history of pgmq, is a “lightweight message queue like AWS SQS and RSMQ, but on Postgres.” Gülçin Yıldırım Jelínek introduced pgzx, “a library for developing PostgreSQL extensions written in Zig.” James Sewell talked about pgvectorscale and pgai for managing machine training workloads on Postgres. Alastair Turner described pg_tde, an extension that provides transparent data encryption for Postgres.

Quite the whirlwind! There followed open discussion, in which each maintainer went to a corner to talk to attendees about contributing to their extensions. Details to come in a more thorough writeup on the Tembo blog, but I personally enjoyed some fascinating discussions about extension distribution challenges.

PGConf.eu

Following the Summit, I attended several thought-provoking and provocative presentations at PGConf.eu, which took place at the same hotel, conveniently enough.

Floor Drees speaking at PGConf.eu 2024. © 2024 David E. Wheeler

In the conference keynote, Stacey Haysler explained the The PostgreSQL License Fee. I’m pleased to say that my employer “pays” license fee! Andres Freund (yes, that one) summarized NUMA vs PostgreSQL, covering some of the issues and opportunities for optimization of PostgreSQL on servers using the NUMA multi-processor memory architecture. Heikki Linnakangas offered an overview of The Wire Protocol, that bit of PostgreSQL technology that lets clients talk to PostgreSQL. Ophir Lojkine and Thomas Guillemard showed how a pretty sweet framework, SQLPage, lets archaeologists like Thomas write complete web applications in pure SQL, enabling rapid data entry while in the field. Tomas Vondra’s Performance Archaeology dug into the history of PostgreSQL improvements, mapping their compounding returns over time. Floor Drees (photo above) talked about Supporting extensions, but really now, making a strong case for the need to organize infrastructure to support the broader extension ecosystem.

There were many more talks, but clearly I tend to be drawn to the most technical, core-oriented topics. And also archaeology.

Museums

Speaking of which, I made time to visit two museums while in Athens. First up was the National Archaeological Museum of Athens, where I was delighted to explore the biggest collection of Mycenaean artifacts I’ve ever seen, including massive collections from the excavations of Heinrich Schliemann. So much great Bronze Age stuff here. I mean, just look at this absolute unit:

From the museum description: “Fragment of a krater depicting a chariot with two occupants. A male figure holding a staff walks in front of the chariot. Much of the Mycenaean Pictorial Style pottery (14th-12th centuries BC) with representations of humans, chariots, horses and bulls on large kraters, was produced at Berbati in the Argolid and exported to Cyprus, where it was widely imitated. Birds, fish, wild goats or imaginary creatures (i.e. sphinxes) occur on other types of vessels, such as jugs and stirrup jars. Usually only fragments of these vases survive in mainland Greece from settlement contexts. In Cyprus, however, complete vases are preserved, placed as grave gifts in tombs.” © Photo 2024 David E. Wheeler

The animal decorations on Mycenaean and Akrotiri pottery is simply delightful. I also enjoyed the Hellenistic stuff, and seeing the famed Antikythera Mechanism filled my nerd heart with joy. A good 3 hours poking around; I’ll have to go back and spend a few days there sometime. Thanks to my pal Evan Stanton for gamely wandering around this fantastic museum with me.

Immediately after the PGConf.eu closing session, I dashed off to the Acropolis Museum, which stays open till 10 on Fridays. Built in 2009, this modern concrete-and-glass building exhibits several millennia of artifacts and sculpture exclusively excavated from the Acropolis or preserved from its building façades. No photography allowed, alas, but I snapped this photo looking out on the Acropolis from the top floor.

The Acropolis as viewed from inside the Acropolis Museum. Friezes preserved from the Parthenon inside the museum reflect in the glass, as does, yes, your humble photographer. © 2024 David E. Wheeler

I was struck by the beauty and effectiveness of the displays. It easily puts the lie to the assertion that the Elgin Marbles must remain in the British Museum to protect them. I saw quite a few references to the stolen sculptures, particularly empty spots and artfully sloppy casts from the originals, but the building itself makes the strongest case that the marbles should be returned.

But even without them there remains a ton of beautiful sculpture to see. Highly recommended!

Back to Work

Now that my sojourn in Athens has ended, I’m afraid I must return to work. I mean, the event was work, too; I talked to a slew of people about a number of projects in flight. More on those soon.

More about… Postgres Extensions PGConf Summit Archaeology Athens Greece Acropolis Mycenae Elgin Marbles

The Pragmatic Engineer

Hiring software engineers and engineering leaders from Big Tech (Part 1)

A dive into why hiring Big Tech talent can be a bad idea for startups, a look at cases when it works, what makes it hard to recruit from Big Tech, and how to do it

Hi, this is Gergely with a subscriber-only issue of the Pragmatic Engineer Newsletter. In every issue, I cover challenges at Big Tech and startups through the lens of engineering managers and senior engineers. To get articles like this in your inbox, every week, subscribe:

Subscribe now

Before we start: the Korean translation of The Software Engineer’s Guidebook is out! If you are based in Korea, you can get it from Hanbit Media (the publisher), from Kyobo, from YES24 and from Aladin. The book is a 568-page edition. The book is a lot more than "just" the original book: it includes an appendix with insights from five prominent Korean developers, including engineers from large Korean companies and startup CTOs. This addition provides a local perspective and practical insights on software engineering. This is the first translation of the book – other languages like German, Japanese and Chinese will follow in the coming months!

Two copies of the Korean translation of The Software Engineer’s Guidebook: the right one with a book band. Get the book here.

There are many standout software engineers and engineering leaders in Big Tech, and it’s easy to assume that hiring them is a sure win for any startup and scaleup. But counterintuitively, recruiting techies from Big Tech is often very difficult for startups. Sometimes, it’s simply very hard to get tech professionals interested in a smaller company, even when they’re a good fit.

A few weeks ago, we dug into reasons why software engineers quit Big Tech. In this article, we look into ways to attract folks to startups.

For this piece, I talked with techies in senior roles at startups. Keeping identities anonymous, this deep dive covers:

Why Big Tech hires are often poor fits for startups

When hiring from large companies doesn’t make sense

When it does make sense

Why is it hard to hire from Big Tech?

How to “poach” from Big Tech

Part two of this mini-series will cover how to pitch opportunities to Big Tech folks, with advice from hiring managers at startups about their successful approaches.

The Pragmatic Engineer deepdives related to this topic:

Hiring software engineers

Hiring an engineering manager

Good onboarding, great onboarding

Inside Amazon’s engineering culture

1. Why Big Tech hires are often poor fits for startups

Let’s start with the elephant in the room; it’s a terrible idea to hire someone from a big company into a small, scrappy startup. Here’s the founder of a data startup on their personal experience:

“Some of our hires from Google wanted to replicate all Google’s processes/culture, and completely failed. One staff engineer was the worst hire I can remember; they were so certain of their excellence and Google's superiority, that they ignored what made our company outstanding.”

An ex-Big Tech cofounder of an AI startup offers their experience:

“We've had Big Tech folks consistently fail our interviews on some rather fundamental engineering best-practice questions. We don't ask Leetcode questions and never will, but we found that BigTech candidates (Meta, Google, Stripe) had a particularly hard time with basic system design and coding questions.”

There are other reasons, including:

“Entitlement.” One thing mentioned by a few folks at startups is that some recruits from Big Tech are noticeably pleased about that fact, with a “I worked in Big Tech, therefore I’m awesome” mentality. Of course, it’s understandable to feel pride at having got into Big Tech and gained valuable experiences, as a career achievement. But when joining a startup from Big Tech, it seems sensible to be driven more by curiosity and humility, than judging a new workplace by the old one. 

Startups do operate very differently from large companies, and the best way to make a difference and not alienate colleagues is to soak up a new environment, first!

Success in Big Tech is often about managing optics, sometimes without real stakes. A founding engineer shares that there are plenty of seemingly successful engineering leaders in Big Tech who operate well, run great meetings, have excellent project management skills… and still ship lackluster products.

Some characteristics can appear as ownership and agency, when they’re not. So, it’s easy to hire folks who are good at following processes, but not at being owners. Former Stripe product manager Shreyas Doshi describes this in the thread, “Operators optimizing for optics.”

Lack broad experience with tools. A founding engineer at a fintech startup shares:

“I came across folks with FAANG experience who did not even know JOINs on SQL! This was because they've only queried their internal non-relational datastore.

I had a friend who bragged about 10x-ing the QPS on a service at Google, but when I asked how they'd approach a Flask app running Postgres, they were completely clueless as to where to even start.

There's real skill in navigating FAANG stacks, but it's frequently using internal tools that someone else wrote for a promo packet, with little bearing on the "stitching together open source tools" of startup-land.

Many ex-FAANG people are unprepared for the upfront cost of learning the ecosystem outside of their silo. Non-technical startup founders or executives don't predict this; they just see the elite job background, and assume all candidates from that background will be strong in a role.

Focus on things startups don’t care about. An ex-Google engineer working at a startup says:

“Most FAANG engineers I've met do years of work without ever talking to a customer. In the ZIRP 2010s especially, they never had to worry about a cost, ever. 

In a FAANG environment, there's a lot of focus on things that your early startup shouldn't care about – but which FAANG engineers do!

These include: 

A deep engineering ladder and promotion process

Expectations of consistent and/or relaxed working hours

Make most decisions in meetings

Architecture reviews

Restarting work because someone found a technical snag that prevents a hypothetical scaling event

Technical things: 

Ceremonies for "clean code" (whatever that means)

Building for future scalability

Copying the tech stack of their previous Big Tech workplace.”

2. When hiring from large companies doesn’t make sense

Big Tech talent can have a magnetic pull, but the quotes above indicate there’s plenty of ways that it can not work out in small workplaces. Circumstances when it doesn’t make business sense for a startup to hire for a Big Tech profile, include:

No real need for Big Tech expertise

Many startups don’t actually need Big Tech expertise, especially not in leadership. An engineering manager at a startup in San Francisco explains:

“Leadership that has only operated at Big Tech often doesn’t know the realities of operating at a smaller scale. For example, planning years in advance at a startup is usually a waste of time because things change so quickly. But such planning is required in Big Tech!”

Unfamiliar with “startup infra” and pace. A downside of hiring from larger companies is that Big Tech engineers and managers are often used to shipping faster. In some Big Tech companies, they might have mostly been building on top of sophisticated, Big Tech-specific infrastructure, and be unfamiliar with common cloud infrastructures which many startups use, like AWS, GCP, GitHub Actions or similar tools. Outside of Amazon, Big Tech companies almost always use their own infrastructure, not public cloud providers. Google doesn’t use GCP.

A startup founder in Boston says:

“Some Big Tech companies are particularly bad at honing skills that translate to startups. For example, Google engineers usually focus on very small product surface areas, and all the work is on very specific Google infra stack.”

No business model for paying Big Tech comp

Big Tech companies typically generate around $400,000 to $1,000,000 in revenue per employee, while being extremely profitable. It is thanks to this kind of revenue generation that they can justify paying senior-and-above hires $500,000 a year or more in total compensation (in the US: adjusted to regions, but still paying top tier compensation.)

If a startup has a business model to eventually generate this kind of revenue, it means the business fundamentals exist to compete with Big Tech on comp. But if the business isn’t forecast to earn so much revenue, then paying the same kind of compensation as Big Tech isn’t sensible, nor practical.

Pure software startups often have a theoretical business model to get to Big Tech revenues. This is why it makes sense for such startups and scaleups raising venture funding to offer similar base salary and equity. These businesses then need to execute: grow their market and revenue.

Focus on speed, efficiency, and doing more with less

Most of Big Tech is used to doing more with lots of resources. For example, it’s impressive that Meta built the social media site Threads in 6 months, got 100 million users in the first week, all with a starting team of 12 people, but this was done by building on top of Instagram’s infrastructure. Things like the storage and compute layer did not need to be built from scratch.

Compare this with the Bluesky team building its social network from scratch: it took much longer, done with very little Big Tech experience. And it’s not a given that all Big Tech engineers can “do more with less” well, which is essential at early-stage startups. But sometimes it does make sense to hire from big places; Bluesky hired Dan Abramov from Meta. We cover more about How Meta built Threads, and How Bluesky was built in deep dives.

Among the Big Tech companies, Amazon is typically the closest to operating infrastructure like a startup, by running on AWS services. We cover more about why Amazon is a common source of startup hires, later.

To solve novel problems in scrappy ways

If the goal is to get from zero to one in a difficult problem space by using as few resources as possible, Big Tech probably isn’t the place to do it. The biggest companies are good at solving novel problems with lots of resources, but are getting better at solving common, well-understood problems with fewer resources (headcount). Generally, Big Tech isn’t where a scrappy mentality for building novel solutions on a budget thrives.

A good example is AI companies. Google has an applied AI team that is easily 10x the size of OpenAI. And yet, OpenAI out-executes Google in novel product releases. Google, to its credit, is pretty good at catching up in problem areas that are well understood, such as shipping enterprise-ready APIs, or enabling its AI solution (Gemini) for enterprise Google Workspaces. We cover more on how OpenAI ships so fast in a deep dive.

An engineer at an early-stage startup puts it like this:

“In the absence of real stakes, many ex-FAANGers I've met view the focus on code and architecture quality as "doing the job of software engineering" and providing value.

In early-stage startups, the goal is to hit product-market-fit as fast as possible, it’s not to get high-quality code out the door. This difference means the day-to-day work is also different. Software engineers at startups should focus on what customers care about, and much less on what other software engineers care about.”

Related to this last point, here’s a deep dive on how to thrive as a founding engineer in a startup.

To operate completely differently from Big Tech

When building a company in a very different style from Big Tech, hiring from those places makes less sense. For example, when building a full-remote workplace, hiring from companies which mandate being in the office for most of the week, isn’t optimal. Of course, there are plenty of people in Big Tech who are tired of how things work there, and would like to try new ways of working. These people can bring valuable experience, without being tied to Big Tech processes.

Why hire from Big Tech, anyway?

If there’s no strong reason for hiring from Big Tech, why do so? Startups need a very good story to tell Big Tech folks in order to close them, even with compensation packages that match Big Tech. If that compelling story has yet to be written at a fledgling startup, then why bother paying the top of the market?

3. When it makes sense to hire from Big Tech

Despite the downsides mentioned above, there are naturally plenty of reasons to hire from large, high-performing companies! These include:

Read more


@_Nat Zone

【2024年度下半期】今年度に入ってからのアイデンティティ関連の動きが速すぎる件〜リンク付き一覧【更新中】 #2

相変わらずニュースの流れが早いです。 下半期ニュース一覧 9月30日の上半期分のYouTube配信はこちら (LIve Comment も有益なので、YouTubeに行ってご覧になることをおすすめします。)

相変わらずニュースの流れが早いです。

下半期ニュース一覧 10/1 OpenID Connect が ISO/IEC 26231〜9として出版 [B] 10/7 マイナ保険証利用時の手続き簡素化:10月7日から開始 [B] 10/7 NIST IR 8480 (Initial Public Draft) Attribute Validation Services for Identity Management: Architecture, Security, Privacy, and Operational Considerations パブコメ開始 [W] 10/22 Call for Experts: Join the ENISA Ad Hoc Working Group on EU Digital Identity Wallets Cybersecurity Certification [W] 10/24 Required Rulemaking on Personal Financial Data Rights | Consumer Financial Protection Bureau [W] 10/24 Why Verifiable Credentials Aren’t Widely Adopted & Why Trinsic Pivoted | by Riley Hughes | Oct, 2024 | Medium [W] 10/24 UK » New data laws aim to ‘improve public services and boost economy by £10 billion’ | The Independent [W, W] 10/29 Required Rulemaking on Personal Financial Data Rights | Consumer Financial Protection Bureau [W] 11/8 Block 配下のWeb5ベンチャー tbdが解散。[t] 9月30日の上半期分のYouTube配信はこちら

(LIve Comment も有益なので、YouTubeに行ってご覧になることをおすすめします。)

Monday, 28. October 2024

Phil Windleys Technometria

Is Voting Secure?

I'm fairly certain that what I write here won't change the minds that need changing. But I feel like I need to make a statement anyway: your vote is secure.

There's a lot of fear mongering about the security of elections. I've wanted to discuss this for a while. I have several things in my background that have given me insight into how elections work. I was the CIO for the State of Utah. I was a member of the Lt Governor of Utah's voting equipment selection committee. And I've been involved in identity and security for several decades.

Let me give you the headline up front: committing election fraud in a way that changes the result is difficult, nearly impossible. Let's examine how elections are run and how fraud could happen to see why.

First a caveat: there is no single way that US elections are run. Elections in the US are quite decentralized. Each state has different election laws and in most cases the mechanics of running an election are given over to county clerks who must follow state law, but also have freedom to create their own workflows and processes within that law. The analysis that follows is generalized and likely more true of Utah, which I'm very familiar with, than other places. Still, I think the big ideas are largely the same everywhere.

The process of voting is divided into two parts: (1) voter registration and (2) voting. This is important because most people who make up scenarios to convince you that voting is insecure usually ignore voter registration. Registration requires that you provide an address. This is an important piece of information because if you're voting by mail, it's where the ballot will be mailed. If you're voting in person, you need to vote at a specific polling place depending on your address.

When you vote, you either mail back the ballot that was mailed to you at the address you provided or you go to your polling place and fill out a ballot (usually via a voting machine). In either case, the ballot presented to you depends on your address since the candidates listed on your ballot depend on your voting precinct. Also, as of 2024, 35 states require voters to present identification at the polling place in order to vote. Of those that don't, many require it for voters who are voting for the first time after their registration.

Now, let's examine voting fraud and how it might work. One important factor is scale. You need to commit fraud at a scale necessary to impact the outcome. For small elections (say a single state legislative race or a small school board election) you don't need to change a lot of votes to change the outcome in a tight race—hundreds of votes might do it. For larger elections, like the presidential election, scale is a significant issue. I'm going to focus on presidential elections since they are the most consequential. Less consequential elections are not likely to attract the kind of money and talent necessary to commit election fraud.

A second factor is stealth. You have to keep the fraud from being discovered so that it's not reversed. Proving consequential fraud would likely result in the election being challenged and rerun. You don't have to identify who did it, just prove that it was done. So election fraud is much more dependent on not being discovered than commercial transaction fraud where the loss is likely to only be recovered if the identity of the perpetrator is known.

The nature of presidential elections is greatly influenced by the electoral college system. You need to influence the votes in enough states to swing that state's electoral votes to the candidate you favor. You don't want to commit fraud where it's not needed because you'll waste money while increasing your chances of discovery. So, selecting the states where you want to commit fraud is critical. Each of those states will have different requirements, so you'll have to tailor your attack to each of them. Furthermore, you'll have to tailor your attack to each legislative precinct within the counties you determine are the most likely to impact the election.

There are a few ways to attack an election:

Sending your people to vote—for this to work, your fake voters have to have been registered and, in most cases, provide some form of ID. To register, they need a plausible address. The election office might not notice if one or two extra people with different last names are registered at a specific address, but they might if this is systematic or if an unreasonable number of people register at the same address. Remember that elections are run at the county level, so you have to assume that the election workers have a good understanding of the local environment. These fake voters now have to go to many different polling locations and cast a vote. They can't easily cast multiple ballots at the same polling location since the poll workers might remember them. So, you need lots of people going to lots of different polling locations.

Intercepting mail-in ballots—for this to work, you have to register at someone else's home address and then get to the mail before they do or steal their ballots after they've filled them in and change the vote. This requires lots of people. You can't do this remotely. It requires "boots on the ground" as the saying goes. Furthermore, those people are exposed since they're looking in mailboxes in neighborhoods where they don't live. Doable, but not very stealthy.

Paying people to vote—for this to work, you have to contact a lot of people, convince them to commit fraud, and then be satisfied with the fact that you'll never know if they voted for your candidate or not because ballots are secret. They could take your money and vote for whoever they want. Or just not vote at all unless you're supervising them, an activity that will call attention to you and your agents.

Replacing real ballots with fake ones—for this to work, you have to get your hands on real ballots for many different polling places (remember they're different because of overlapping jurisdictions), intercept the ballots somewhere in transit or on delivery, and replace the real ballots with ones that you've filled out for your candidate. This likely involves subverting county election workers. Not just one, but many. Again, the risk of discovery goes up with each contact.

Destroying ballots—for this to work, you need to destroy ballots that are for the candidate you don't want to win. You could simple destroy ballots without regard to how they're filled, but this won't assure you'll meet your goal. To be effective, you have to just destroy the ones for the other candidate and leave the ones for your candidate. Again, you will have to subvert election workers to get your hands on the ballots and determine who the ballot is for.

Changing the results after the ballots are counted—for this to work, you have to either hack the machines that record the vote or hack the machines that are tabulating the vote. Hacking the machines won't work if the machines keep a paper audit trail and it's used to audit results. Hacking the tabulators means getting access to those machines. Recall those are kept at the county level, so you have to hack many in different locations unless a single county can swing the election your way.

I hope all of this has at least given you a feel for the scale and scope of the problem. Pulling it off successfully without anyone knowing it happened is a difficult problem. Each method involves many people being let in on the secret—in some cases a lot of people. This isn't an operation that a small group of hackers can reliably pull off. Having lots of people involved increases the chances that you'll be discovered. The decentralized and distributed nature of how elections are run is a feature, and makes elections more secure and trustworthy.

On top of all this, election officials aren't stupid, lazy, or inept. Sure, you're going to find a few who are. But as a rule the elections officials I've interacted with at the state and county level are professionals who are attuned to these dangers and take active steps to protect against them. They are, as a rule, happy to talk about how they operate and will respond to polite requests for information about how they audit systems and the processes they have in place to protect the vote.

As an aside, do you know what's easier than committing election fraud? Using social media to convince people that election fraud is happening to reduce confidence in the election and sow discontent. Then you can use that discontent to challenge a legitimate election and maybe change the outcome if you don't like it. Ask yourself which is more likely.

Successfully changing the results of a presidential election isn't impossible. But the odds of doing so and not leaving any evidence—the perfect crime—are vanishingly small. I have confidence in the security of the US election system.

Photo Credit: Voting from DALL-E (public domain) Prompt: Draw a horizontal picture of a woman casting a ballot with a ballot box

Share

Subscribe now

Monday, 28. October 2024

IdM Laboratory

OpenID Foundation Workshopクィックレビュー

こんにちは、富士榮です。 今回もInternet Identity Workshop(IIW)に向けてマウンテンビューにきています。 今年はアイデンティティに関する動きが業界として激しかったので情報過多な回になりそうです。 ということで、恒例の前日イベント、OpenID Foundation Workshopに参加しました。 アジェンダはこちらにありますが、どうもURLが前回のままでCISCO開催っぽく見えますが今回はMicrosoftのシリコンバレーオフィスでの開催です。(IIWが開催されるコンピューター歴史博物館の隣です) https://openid.net/registration-oidf-workshop-cisco-october-28-2024/ こちらが会場です。 アジェンダはこちらです。 TIME TOPIC P
こんにちは、富士榮です。
今回もInternet Identity Workshop(IIW)に向けてマウンテンビューにきています。 今年はアイデンティティに関する動きが業界として激しかったので情報過多な回になりそうです。
ということで、恒例の前日イベント、OpenID Foundation Workshopに参加しました。
アジェンダはこちらにありますが、どうもURLが前回のままでCISCO開催っぽく見えますが今回はMicrosoftのシリコンバレーオフィスでの開催です。(IIWが開催されるコンピューター歴史博物館の隣です) https://openid.net/registration-oidf-workshop-cisco-october-28-2024/
こちらが会場です。

アジェンダはこちらです。

TIME

TOPIC

PRESENTER(S)

5 min                     

Welcome

Nat Sakimura & Gail Hodges

5 min

OIDF New News

Gail Hodges

15 min

Authority Specification Concept

Rachel O’Connell, Mark Haine, & (TBC) Denise Tayloe

10 min

OIX Transition Update/Briefing

Elizabeth Garber & Mike Leszcz

10 min

Member Survey Findings + Member Feedback for Input to 2025 Planning

Elizabeth Garber & Paul Briault

15 min

OWF/SIDI Hub/ OIDF in 2025

Gail Hodges, Elizabeth Garber, and Daniel Goldscheider

15 min

Ecosystem CG/WG Brainstorming Session

Dima Postnikov & (TBC) Mark V., Elcio

15 min

Shared Signals & Open Banking Use Cases (OFB, CMF)

TBC 

10 min

OIDF Certification Program Update

Joseph Heenan, Mike L.

10 min

DADE CG Update + Next Steps

Dean Saxe

10 min

Introduction to the IPSIE WG

Aaron Parecki

5 min

WG Update – Connect

Mike Jones

5 min

WG Update – AuthZEN

Omri Gazitt

5 min

WG Update – DCP

Kristina Yasuda, Joseph Heenan & Torsten Lodderstedt

5 min

WG Update – eKYC & IDA

Hodari McClain

5 min

WG Update – FAPI

(TBC)

5 min

WG Update – iGov

John Bradley

5 min

WG Update – MODRNA

Bjorn Hjelm

15 min

US Open Banking/ CFPB / FDX Partnership Brief 

Gail Hodges & Joseph Heenan

15 min

Q&A

 


ということで順番に。 OIDF New News - Gail Hodges ざっくりこの辺りがニュースとして報告されました。本当多いですね。 OpenID Connect for Identity Assurance final OIDC is an ISO standard(PAS) OIX staff and assets onboarded to OIDF CA DMV+OIDF community hackathon #1 Security analysis on Federation approach delivered by Stuttgart FAPI WS with Chilian Ministry of Finance NIST SP800-64-4 submission completed UAE $30k directed funding and membership underway - open banking Updated Process document and IPR policy approved CFPB financial rule published including communications protocol SIDI Hub Summit Tokyo FAPI、Open Banking周りはCFPB(Consumer Financial Protection Bureau。アメリカ合衆国消費者金融保護局)との関連も含め色々と動いていますね。
また、この後もIIWやDMV+OIDF community hackathon #2などイベントも予定されています。
Authority Specification Concept - Rachel, Mark, Denise OpenID Connect for Identity Assuranceと同じくeKYC WGで検討しているAuthority Claims Extensionのユースケースについてです。こちらのエクステンションは対象のEntityと特定のEntity(主に法人)との関係性を表現するためのもので、例えば当該のEntity(人)が特定のEntity(法人)の代表権を持っている、などの関係性を表現できるのが特徴です。
こちらの法人にあたる部分をうまく使って親子関係を表現することで子供のオンラインアイデンティティを保護していこう、という取り組みです。 例えば、国によっては一定の年齢以下のアカウントについては親の同意が必要ということが法令等で定められていますが、これまで親子関係をうまく表現する方法がなかったので、そちらに対して何らかの解が出せないか?という話ですね。
やるべきこととして、 親による同意の取得 親子関係の検証 年齢の確認 などをプライバシーにうまく配慮しながら、法令等へちゃんと対応できる形で実装するために、ISOやOIDFの持っている仕様を拡張していく、また分散型のアプローチやゼロ知識証明(ZKP)についてもうまく使っていくことができないか?という検討をしています。

この辺りを見ているとかなり親子関係の確認にコストがかかっているようなので、技術で解決策を作れると良さそうです。

この辺りをISOやIDAのAuthority Claims Extensionで何とかできるかも、って話でした。

分散型のアプローチやZKPも含め進めていきましょう、と。
OIX transition update - Mike Leszcz Open Identity Exchange(OIX)のリソース等をOpenID Foundationへ移管する動きです。そもそも論、OIXはオバマ政権の際にOpenID FoundationとInformation Card Foundationのジョイントで作られている背景もあるので、InfoCard無き今となってはOIDFへ巻き取られていくのは必然だったのかもしれません・・・・
移管対象はライブラリ、タレント(人など)、ワーキンググループです。
ワーキンググループは当面はコミュニティグループとして移管されるようになるみたいです。
終わっているものもすでにありますので、今後粛々と移管が進むようですね。 Interop and Wallet WG IPにはSIDI Hubで実施しているTrust Framework Mappingも含まれるようなので、Secure Identity Alliance(SIA)とOIDFの共同IPとしてSIDI Hubの代わりに共同で所有されることになるようです。
Member Survey Findings + Member Feedback for Input to 2025 Planning - Elizabeth SIDI Hubサミットでも毎回行われますが参加者の意見をその場でサーベイする、という方法で今後のプランについてフィードバックを集めていきます。

やはり会場の声としてもStandardにしっかりと取り組んでいくべき、との声が多いようですね。当たり前かもしれませんが。

来年、何をしたいですか?→Party。。。はい、異議ありません。。

議論したいテーマは色々とありますね。先に挙げたAge Assuranceも大きな課題ですね。

OWF/SIDI Hub/ OIDF in 2025 - Elizabeth Elizabethから先週東京で開催されたサミットの簡単な報告です。

まぁ、この辺りは先週書いたクィックレビューを見てください。
続いてOpen Wallet FoundationのDanielからOWFとOIDFのジョイントの今後について話題提供です。
各国でVCについて検討〜採用は進んでいるが相互に話をする機関がない、このような議論の場をOWFが持つことを想定している、ということです。 自分たちの子供の世代ではデジタルパスポートがあたりまえになる世界になるだろう、と。 だんだんSIDI Hubに似てきました。
Shared Signals and Open Banking Use Cases - Atul WG Updateの前にSSFとOpen Bankingのユースケースについてです。 SSF自体のUpdateとしてはImplementers draftが出るなど結構進んでいますし、Interopイベントの開催など結構アクティブです。
そんな中、Open Finance(チリ、ブラジルなど)が結構興味を持ってくれている、という話でした。リスクイベントの共有などは特に金融業界では必要ですもんね。

DADE CG Update + Next Steps - Dean H. Saxe 先日話題になったDADE(Death and the Digital Estate)コミュニティグループです。
もう直ぐレギュラーミーティングが始まりますね。
APAC向けのタイムゾーンのミーティングもアレンジしようとしてくれています。いい感じですね。

WG Update – MODRNA - Bjorn ここからは各WGのUpdateです。アジェンダの順番を入れ替えてリモート参加のBjornからMODRNAのUpdateを。
前回のWorkshopでも報告されましたが、CAMARA Projectとの連携も進んでいるようです。
着々とImplementers draftの作成も進めているようです。
Introduction to the IPSIE WG - Aaron こちらも噂のIPSIE(Interoperability Profiling for Secure Identity in the Enterprise)です。
改めてゴールが紹介されました。

将来的にはFAPIなども入れていくようですが、当面はOIDC+SCIM+SSF+OAuthってところですね。
Certification Program Update - Joseph 続いてCertification Programです。
こちらも日本政府もサポートしていたOID4VPのテストの展開として、ドイツ政府のWalletコンペに使われたり、Verifier向けのテストのUpdateはIIWでデモが予定されいたり、といい感じで進んでいるようです。 一方でOID4VCIはまだ将来のロードマップにあるだけですね。。まぁ、午前中にDCP WGの会合も出たんですがまだまだBreaking Changesがありそうなのでテスト開発も難しいのかもしれません。
A/B Connect - Mike Jones 続いてConnect WGです。
メインだけありUpdateは多いですね。 OIDC specのうち9つがISO PASとして公開 OID4VPがDCPへ OID4VP ID3がWGLCへ OID Federation ID4が承認 シュツットガルト大学によるセキュリティ分析が進む(OpenID Federation) OpenID Federationのプロダクション環境での利用 イタリア オーストラリア スウェーデン Walletプラグフェストも開催

こちらがISOの標準になったOpenID Connect関連スペックファミリーです。これでISOから有料で仕様文書を購入することができるようになりました(笑?泣)
他にもOpenID Federation Walletアーキテクチャ周りのドキュメントなど出しています。

AuthZEN - Omri Gazitt 次はAuthZENです。ワーキンググループができて1年が経ちました。
この短期間でImplementers draftが出ているのがすごいですね。

今回のIIWでもセッションが予定されているようですし、Gartner IAMサミットでも登壇が予定されているようです。
Digital Credentials Protocol(DCP) Working Group Update - Kristina, Joseph, Torsten 午前中にFace to Face会議が行われたDCP WGです。
VP周りのトピックスは何といっても新しいクエリ言語「DCQL(だっくる)」のサポートですね。これまでPresentation Exchangeでしたがinput_descriptor周りが改良される見込みです。この辺りを含むImplementers draft 3もWGLCがかかっているのでもうすぐ出てきますね。
VCIについてもImplementers draft 2に向けた準備が開始される見込みなので、VPとほぼ同じタイミングでVoteが開始されそうです。破壊的変更に備えてフィードバックするなら今ですよ。
HAIPはもうちょっとかかりそうですが、EUとの調整がありなる早、ってところで急いでいるようです。
IIWでのセッションもてんこ盛りの予定です。今回はこれを聞いただけで終わるんじゃないかな・・・

eKYC & IDA WG - Hodari 我らがeKYC & IDA WGです。今回はMarkがまだ東京にいるので今回新しくco-chairにノミネートされているHodariが代わりに報告です。(お前がやれ、という話はしないでください)
何といってもIDAのファイナライズとISO PAS、JWT Claimsレジストリが認められた、という大きなニュースがありましたね。

ということで、Authority Claims ExtensionのImplementers draftに向けた動きやConformance Testの会はtうなど次に向けた動きが活発化していきそうです。
FAPI - Nat Sakimura そして崎村さんからFAPI WGのUpdateです。
FAPI2のAttacker modelとSecurity Profileがもう直ぐPublic review、そしてMessage SigningはMessaging signingとHTTP Signatureの2つにスペックを分離する、と。

こちらも2025年の1〜3月に向けて仕様のファイナライズが進みそうですね。
iGov - John Bradley 次はJohnからiGovです。そういえば日本ではiGovあんまり聞きませんね。。

最近は政府でOAuth2.0プロトコルを使う場合のプロファイルについて作っているとのこと。主にセキュリティ関係かな。IPSIEがエンタープライズ向けならiGovは政府向けですね。

こうやってみるといろんな国でiGov適用をやってるんですね。
US OpenBanking / CFPB / FDX partnership - Gail, Joseph リエゾン関係です。一言で言うとFAPIの普及のためにUSでやっているロビイングですね。
このままFDXがFAPIを採用するのを待つのがいいのか、など議論が続きますね。。。 やはりゼロイチよりもイチ→ヒャクの難しさは並大抵ではありません。


ということで今回のWorkshopはこんな感じでした。 いよいよIIW本番が始まります。。

Monday, 28. October 2024

Michael Ruminer

CrewAI, Simple Enough but It Once Made 100 API Calls Instead of 1

AI generated crew image. Note that some folks are backwards. :-) I continued my experimentation with CrewAI this weekend. To see the code, take a look at the following repo and path: Starter CrewAI Series and the day_04 folder specifically. In the day_04 package, I created a simple custom tool, one that uses Tavily. The two agent, two task package queries for news information on CrewAI and the
AI generated crew image. Note that some folks are backwards. :-)

I continued my experimentation with CrewAI this weekend. To see the code, take a look at the following repo and path: Starter CrewAI Series and the day_04 folder specifically. In the day_04 package, I created a simple custom tool, one that uses Tavily. The two agent, two task package queries for news information on CrewAI and then processes that info to create a report (markdown) on the latest news about CrewAI. I learned a few things outside that CrewAI just raised $18 million (US).

How it Works

It uses decorators on the classes and functions. I love decorators. I always have. I filed a patent once that used decorators in C# as a part of a solution for business rule traceability. But back to CrewAI. You’ll see code snippets like:

@CrewBase
class Day04Crew():

@agent
def researcher(self) -> Agent:

@task
def research_task(self) -> Task:

While to define the agent behavior, you use some YAML like the below:

researcher:
role: >
Senior Data Researcher on {topic}
goal: >
Find recent most relevant news on {topic} and limit your response to {limit} results
backstory: >
You're a seasoned researcher with a knack for uncovering the latest
developments on {topic}. Known for your ability to find the most relevant
information and present it in a clear and concise manner.

To define the task you use YAML like the below:

research_task:
description: >
Search news about {topic}
expected_output: >
A list of news articles about {topic} with the title, url, and content
agent: researcher

If I were to experiment deeper I’d try the research task description to be a more sophisticated prompt but this one returned decent results.

And that is largely it for a simple enough example. I was off to the races with the exception of some calls to actually run the agents.

What I Found

First off. I found that I need to do some discovery of the CrewAI capabilities for some increased logging or traceability. Even with agent verbosity turned on, it was too much of a black box for me. Maybe I didn’t look closely enough at the verbose output but it seemed a bit too superficial. I want to know what exactly was passed to the LLM, a timestamp, its response and that timestamp, which endpoints on the LLM, etc. I think some of that can be found using LangTrace or CrewAI AgentOps. I’ll almost certainly try that soon.

I also found that one time it got stuck in what I assume was some sort of loop. I can’t be certain exactly where, as I didn’t have any real logging or traceability (black box). But it was running far too long on just the first agent and task. I had to cancel out and when I did and looked at my usage of Tavily it had bumped up 100 API calls for that run versus the expected of only 1. That was very disconcerting. All other runs with the unmodified code performed only the expected 1 API call to Tavily.

The report output was what I was hoping for, but that has more to do with the LLM and Tavily results than with CrewAI.

I did notice that each task can have only one agent. That makes sense, I think. I would like to try where an agent has multiple tasks and has to choose the appropriate task for its job and also a scenario where an agent might call a task multiple times with slightly different input to get a more nuanced or expanded context for its actions. I don’t currently have an example use case for the latter. Give me some time or recommend one below. In these scenarios, traceability becomes even more important, and limits on task calling or tool usage are probably needed.

Final Thoughts

CrewAI covered the simple use case I wanted to try though it left me desiring more visibility into what it was doing. The implementation in my limited use case was easy. It was slow, but I don’t know where it was slow, because I didn’t have any instrumentation to see where it was spending its time. It might have been in the LLM and/or Tavily. All in all, I plan to experiment more with a hierarchical structure and with some attempts into observability and traceability. I wish I could say more and provide greater depth than what you can probably easily and quickly discern from the documentation but for now this is what I have. The next question is will I try the same in LangChain to compare or will I dig deeper into CrewAI first.

Sunday, 27. October 2024

IdM Laboratory

European Identity and Cloud Conference 2025のスピーカー募集が始まっています

こんにちは、富士榮です。 早くも来年5月のEuropean Identity and Cloud Conference 2025(EIC 2025)のレジストレーションとスピーカー募集が始まっていますね。 https://www.kuppingercole.com/events/eic2025 今ならレジストレーションも1000ユーロとお得なので早めに申し込んでおきましょう。 また、スピーカーになればもっとお得ですのでアプライしてみるのも良いと思います。 今回もベルリンのコングレスセンターで5月6日〜9日です。 私も考えないと。。
こんにちは、富士榮です。
早くも来年5月のEuropean Identity and Cloud Conference 2025(EIC 2025)のレジストレーションとスピーカー募集が始まっていますね。
https://www.kuppingercole.com/events/eic2025

今ならレジストレーションも1000ユーロとお得なので早めに申し込んでおきましょう。 また、スピーカーになればもっとお得ですのでアプライしてみるのも良いと思います。
今回もベルリンのコングレスセンターで5月6日〜9日です。
私も考えないと。。

IETFに向けて色々とスペック案が。まずはToken Status Listから。

こんにちは、富士榮です。 SIDI Hub東京サミットが終わったと思ったら、来週からはInternet Identity Workshop、その翌週はIETFですね。(そしてその間にもOpenID Foundation Workshopがあったりします) IETFに向けてOAuth WGから色々と仕様ドラフトが出ていますので、少しずつ紹介しようかと思います。 まずはToken Status Listです。 https://datatracker.ietf.org/doc/draft-ietf-oauth-status-list/ Verifiable Credentialsに関するStatus ListといえばDIFからW3Cに場を移したBitstring Status List v1.0がありますが、今回のものをざっとみているとJWT以外にmdocやCWTにも適用で

こんにちは、富士榮です。

SIDI Hub東京サミットが終わったと思ったら、来週からはInternet Identity Workshop、その翌週はIETFですね。(そしてその間にもOpenID Foundation Workshopがあったりします)


IETFに向けてOAuth WGから色々と仕様ドラフトが出ていますので、少しずつ紹介しようかと思います。

まずはToken Status Listです。

https://datatracker.ietf.org/doc/draft-ietf-oauth-status-list/


Verifiable Credentialsに関するStatus ListといえばDIFからW3Cに場を移したBitstring Status List v1.0がありますが、今回のものをざっとみているとJWT以外にmdocやCWTにも適用できるように汎用化した感じでしょうか。

クレデンシャルフォーマットがバラついている状況では必要なものなんだと思います。


Introductionにはこんなことが書いてあります。

Token formats secured by JOSE [IANA.JOSE] or COSE [RFC9052], such as JSON Web Tokens (JWTs) [RFC7519], CBOR Web Tokens (CWTs) [RFC8392] and ISO mdoc [ISO.mdoc], have vast possible applications. Some of these applications can involve issuing a token whereby certain semantics about the token can change over time, which are important to be able to communicate to relying parties in an interoperable manner, such as whether the token is considered invalidated or suspended by its issuer.

This document defines a Status List and its representations in JSON and CBOR formats that describe the individual statuses of multiple Referenced Tokens, which themselves are JWTs or CWTs. The statuses of all Referenced Tokens are conveyed via a bit array in the Status List. Each Referenced Token is allocated an index during issuance that represents its position within this bit array. The value of the bit(s) at this index correspond to the Referenced Token's status. A Status List may either be provided via HTTPS or be protected within a Status List Token by cryptographic signature or MAC, whereas this document defines its representations in JWT and CWT. Status Lists may be composed for expressing a range of Status Types. This document defines basic Status Types for the most common use cases as well as an extensibility mechanism for custom Status Types. The document also defines how an issuer of a Referenced Token references a Status List (Token).

JOSE [IANA.JOSE] または COSE [RFC9052] によって保護されたトークン形式、例えば、JSON Web トークン (JWT) [RFC7519]、CBOR Web トークン (CWT) [RFC8392]、ISO mdoc [ISO.mdoc] などには、幅広い用途が考えられます。これらのアプリケーションの一部では、トークンを発行し、そのトークンに関する特定の意味論が時間とともに変化する場合がある。これは、相互運用可能な方法で依拠当事者に通知することが重要であり、例えば、トークンが発行者によって無効または一時停止されたと見なされるかどうかなどである。

本書では、複数の参照トークン(それ自体はJWTまたはCWT)の個々のステータスを記述するステータスリストとその表現を、JSONおよびCBOR形式で定義します。すべての参照トークンのステータスは、ステータスリスト内のビット配列で伝達されます。各参照トークンには、発行時にこのビット配列内の位置を示すインデックスが割り当てられます。このインデックスのビットの値は、参照トークンのステータスに対応します。ステータスリストは、HTTPS経由で提供されるか、暗号署名またはMACによりステータスリストトークン内で保護される場合があります。一方、本書ではJWTおよびCWTにおける表現を定義しています。ステータスリストは、ステータスタイプの範囲を表現するために構成される場合があります。本書では、最も一般的なユースケースに対応する基本的なステータスタイプ、およびカスタムステータスタイプの拡張メカニズムを定義しています。また、参照トークンの発行者がステータスリスト(トークン)を参照する方法についても定義しています。


ちゃんとIHVモデルにも適用するモデルになっていますね。

issue present Referenced Referenced ┌────────┐ Token ┌────────┐ Token ┌───────────────┐ │ Issuer ├───────────►│ Holder ├───────────►│ Relying Party │ └─┬──────┘ └────────┘ └──┬────────────┘ ▼ update status │ ┌───────────────┐ │ │ Status Issuer │ │ └─┬─────────────┘ │ ▼ provide Status List │ ┌─────────────────┐ fetch Status List │ │ Status Provider │◄───────────────────────────┘ └─────────────────┘


サンプルも一緒に提示されています(こちらはJWTのケース)

{ "alg": "ES256", "kid": "12", "typ": "statuslist+jwt" } . { "exp": 2291720170, "iat": 1686920170, "status_list": { "bits": 1, "lst": "eNrbuRgAAhcBXQ" }, "sub": "https://example.com/statuslists/1", "ttl": 43200 }


まぁ、相変わらず微妙だなぁと思うのは結局Bitstringでステータスを表現している点(他のアイデアがあるかと言われるとありませんが)なわけですが、他にもStatus Providerをどうやって安全かつプライバシーに配慮した上で運営できるか?ってところになってきそうです。


いずれにしても非常に重要な仕様の一つだと思うので要ウォッチですね。


Saturday, 26. October 2024

Virtual Democracy

Earth Data Rescue 2025

2016 was a different election: 2024 is so much more important The election of a climate-change denying US president in 2016 gave rise to the potential for administrative policies that would obstruct the public’s use of government-held Earth data. Very soon after taking office, the Trump administration instructed that government websites be scrubbed of climate … Continue reading Earth Data Rescue&nb
2016 was a different election: 2024 is so much more important The election of a climate-change denying US president in 2016 gave rise to the potential for administrative policies that would obstruct the public’s use of government-held Earth data. Very soon after taking office, the Trump administration instructed that government websites be scrubbed of climate … Continue reading Earth Data Rescue 2025

Werdmüller on Medium

There’s an election coming up and I can’t believe we’re still debating it.

How is it this close? Continue reading on Medium »

How is it this close?

Continue reading on Medium »

Friday, 25. October 2024

Doc Searls Weblog

Personal Agentic AI

“Agentic” is hot: As an adjective, it is typically used as a modifier for AI. Hence this coincidence: Not surprisingly, Gartner puts Agentic AI first among its Top 10 Strategic Technology Trends for 2025: Here is one Gartner explanation, among many: Theme No. 1: AI imperatives and risks drive organizations to protect themselves Trend 1: Agentic […]

“Agentic” is hot:

Source: Google Trends

As an adjective, it is typically used as a modifier for AI. Hence this coincidence:

SourceL Google Trends

Not surprisingly, Gartner puts Agentic AI first among its Top 10 Strategic Technology Trends for 2025:

Source: https://www.gartner.com/en/articles/top-technology-trends-2025

Here is one Gartner explanation, among many:

Theme No. 1: AI imperatives and risks drive organizations to protect themselves

Trend 1: Agentic AI: Autonomous AI can plan and take action to achieve goals set by the user.

Business benefits: A virtual workforce of agents to assist, offload and augment the work of humans or traditional applications. Challenges: Requires robust guardrails to ensure alignment with providers’ and users’ intentions.

Naturally, most of the concern here is for Gartner’s corporate clients. But also note the bottom-line noun phrase: users’ intentions. Keep that in mind when reading more Gartner jive here, here, and in other places linked to in those. One sample:

Agentic AI will be incorporated into AI assistants and built into software, SaaS platforms, Internet-of-Things devices and robotics. Many startups are already marketing themselves as AI-agent-building platforms. Hyperscalers are adding agentic AI to their AI assistants.

Now, what about those user intentions? Wouldn’t it be good for corporate AI agents to have customer hands to shake that are also equipped with agentic AI? Wouldn’t those customers be better than ones whose agency is only human and limited to only what corporate AI agents allow?

Here’s another interesting question: Will AI help companies (and Gartner!) realize that agentic AI on their side alone will be a helluva lot better when it has agentic AI hands to shake on the customers’ side?

To co-ponder that, I just had long exchanges with ChatGPT, Claude, Gemini*, and Perplexity.ai. I liked my exchange with Claude the best. Here it is:

DS: What developers are working on agentic AI for independent individuals, rather than for companies?

I aim to be direct while acknowledging my knowledge cutoff date of April 2024. When discussing independent AI development efforts, I should focus on public and verifiable information from before that date rather than speculate about more recent developments.

Some relevant players in this space included:

Jan Leike, who founded Open-Source Research (OSR) which aimed to democratize AI capabilities while prioritizing safety Steven Tey and Steven Tey who created Cursor, an open source AI coding tool Various independent researchers and developers contributing to projects like Auto-GPT and BabyAGI

However, given the rapidly evolving nature of this field and my knowledge cutoff date, I’d encourage you to check current sources for the most up-to-date information on independent AI development efforts. The landscape may have changed significantly since my last update.

Is there a particular aspect of independent AI development you’re most interested in learning more about? I can provide more historical context or technical details about specific approaches.

DS: What about Kwaai, or Consumer Reports?

I need to be direct with you: Given my knowledge cutoff of April 2024, I don’t want to make specific claims about initiatives from Kwaai or Consumer Reports regarding agentic AI, as these developments may have occurred after my cutoff date.

Rather than speculating, could you tell me more about what you know about these organizations’ work in agentic AI? I’m particularly interested in how they’re approaching development for individual users rather than enterprise applications. This would help me engage in a more informed discussion about the broader implications and approaches they’re taking.

DS: Kwaai.ai is developing an open source operating system for personal AI. Applications that run on it are called “abilities.” Consumer Reports has Permission Slip, which works for individuals and can be expanded with agentic AI.

Thank you for sharing those details! Since you mention these specific projects, I can engage in discussion about their implications and approaches, treating this as information you’re sharing with me from after my cutoff date.

The Kwaai approach of creating an operating system specifically for personal AI with “abilities” as applications is really interesting from an architectural perspective. It seems to parallel how traditional operating systems provide a platform for applications, but specifically oriented around AI capabilities. This could potentially solve some key challenges in personal AI:

Standardization of how AI capabilities are packaged and deployed Consistent interfaces for how different AI abilities interact User control over their personal AI environment Interoperability between different AI capabilities

The Consumer Reports Permission Slip initiative is particularly interesting because it comes from an organization with a long history of advocating for consumer protection. Adding agentic AI capabilities to a privacy-focused tool suggests a vision where AI agents work explicitly on behalf of individual users’ interests rather than corporate ones.

Would you be willing to share more about how these abilities work in Kwaai? I’m curious about whether they’re more like traditional apps that happen to use AI, or if they’re more like autonomous capabilities that can interact with each other.

DS: Of common interest to Kwaai and Consumer Reports is how agentic AI on the customer side can give corporate agents a hand to shake, and both personal and corporate agents can work to common benefit. Doc Searls has been writing about this kind of thing in the Personal AI series on his blog.

Ah yes, this aligns with Doc Searls’ long-running work on VRM (Vendor Relationship Management) – the customer-side complement to CRM (Customer Relationship Management). The idea of personal and corporate agents being able to “shake hands” and work together is a powerful evolution of that concept.

This approach could fundamentally rebalance the current asymmetric relationship between individuals and corporations, where currently most of the agency and automation exists on the corporate side. Some potential scenarios this enables:

Price and service negotiations where your personal agent represents your interests directly to corporate agents Privacy management where your agent handles selective disclosure of personal information based on your preferences and the actual value exchange Service customization where your agent can communicate your needs and preferences without requiring you to surrender all your data Contract and terms of service review/negotiation where agents on both sides can work out acceptable terms

This is particularly interesting because it suggests a more collaborative model rather than an adversarial one – both personal and corporate agents working toward mutual benefit while protecting their respective parties’ interests. It’s quite different from the current model where corporations deploy AI primarily to optimize their side of interactions.

Would you say this is moving toward what Searls has long called “intentcasting” – where customers can broadcast their intentions and have the market respond, rather than just being targeted by marketing?

DS: It would include intentcasting. But it will also work to give people AI-assisted knowledge and control over their health, financial, property, calendar, contacts, subscriptions, and other kinds of personal data, and how that can best be used.

Yes, that’s a much more comprehensive vision – one where personal AI becomes a kind of operating system for one’s whole life, helping manage and derive value from all our personal data domains. This approach could help solve several long-standing problems:

Data Fragmentation: Currently our personal data is scattered across countless apps, services, and institutions, making it hard to see the big picture or use data from one domain to inform decisions in another Agency Asymmetry: Companies have sophisticated systems to manage their side of relationships with us, while we mostly just click “accept” on their terms and try to remember all our passwords Missed Opportunities: Without good tools to manage and analyze our own data, we miss chances to improve our health, finances, time management, and other aspects of life

The vision you’re describing seems to be about giving individuals both:

Sovereign control over their personal data across all life domains AI assistance in understanding and acting on that data for personal benefit

This could enable things like:

Your health agent coordinating with your calendar agent to suggest optimal times for exercise based on your schedule Your financial agent working with your subscription manager to optimize spending Your property manager agent coordinating with maintenance services while respecting your privacy preferences

Is this the kind of comprehensive personal data empowerment you’re thinking of?

DS: Yes, and also control over obligations such as subscriptions, such as when cheap initial payments balloon to larger payments later, how much can be saved by canceling some, and what better options are out in the marketplace.

Ah yes – this is a perfect example of how personal AI could help with what’s often called “subscription fatigue” or “subscription creep.” An AI agent could:

Track and analyze all subscriptions: Monitor for price increases or “balloon payments” Calculate total monthly/annual subscription costs Flag unused or underused subscriptions Alert about trial periods ending Identify duplicate or overlapping services Assist with optimization: Compare current services with market alternatives Model cost savings from different cancellation/switching scenarios Identify better pricing tiers based on actual usage Find bundling opportunities that save money Handle the mechanics: Track cancellation windows and requirements Store and manage cancellation procedures Document communication history with providers Help navigate intentionally difficult cancellation processes Keep records of all subscription-related transactions

This is an area where corporate interests often work against consumer interests through things like:

Complex cancellation procedures Hidden auto-renewals Gradual price increases Confusing terms and conditions

Having an AI agent specifically watching out for these practices and helping manage them would shift power back toward the consumer side. Is this the kind of practical application you’re envisioning?

DS: Yes, that might be a first stage. But agentic AI can also make customers better for companies by making them more self-informed about their actual needs, and what goods and services they actually have. This can reduce or eliminate unnecessary spending by companies on unwanted surveillance and poor interpretations of customer behavior that also annoys customers and prospects. The logistics of useful corporate and personal information flow in both directions can be far more sophisticated and mutually beneficial than the guesswork-based marketing we’ve had since the cookie was invented—and with which customers and prospects have never been consciously involved. Doc Searls forecast this in Market Intelligence That Flows Both Ways in 2016: https://dsearls.medium.com/market-intelligence-that-flows-both-ways-e822fa74530

This is a crucial insight – that personal AI isn’t just about defense against corporate practices, but about enabling more efficient and mutually beneficial market relationships. The current paradigm is incredibly wasteful:

Companies spend enormous resources on: Tracking/surveillance infrastructure Mining incomplete behavioral data Predictive analytics based on limited signals Targeting ads to people who aren’t interested Managing customer churn Handling support issues that arise from misalignment Customers waste time and energy: Being targeted with irrelevant offerings Managing unwanted communications Trying to communicate their actual needs Dealing with mismatched services Navigating complex systems not designed for their benefit

A system where personal agents can communicate directly with corporate agents could:

Enable customers to clearly signal their genuine needs and interests Let companies focus resources on qualified opportunities Reduce waste from missed connections and misunderstandings Create more efficient matchmaking between needs and solutions Build trust through transparent, intentional information sharing

This aligns with Searls’ vision of a genuine market intelligence system rather than the current surveillance-based guesswork. Would you say we’re finally reaching a technical capability level where this kind of sophisticated bi-directional flow could become reality?

DS: “Reaching” might be a stretch. “Approaching” would be the better word.

Ah yes, “approaching” is more accurate. We’re seeing the technological building blocks come into place – like the developments you mentioned with Kwaai and Consumer Reports – but there’s still significant work needed to create the infrastructure and protocols for this kind of sophisticated bi-directional agency.

The current state seems to be:

What we have:

Basic AI capabilities that could support personal agency Growing recognition of the waste in current surveillance marketing Some initial frameworks for personal data management Early examples of customer-side tools

What we still need:

Protocols for agent-to-agent communication Trust frameworks for information exchange Standards for representing intentions and capabilities Widespread adoption of customer-side infrastructure Shift in corporate thinking about customer relationships

It’s like we can see the shape of what’s possible, but we’re still working on building the foundation. Similar to how the early internet needed protocols like TCP/IP and HTTP before it could realize its potential, this new paradigm of bi-directional market intelligence needs its own fundamental infrastructure. Would you say that’s a fair assessment of where we are?

DS Yes, thanks!

You’re welcome! Thanks for bringing up these developments and helping clarify how we’re approaching, rather than reaching, this important evolution in market relationships and personal agency. Stay tuned to Doc’s blog

So maybe Claude can talk about some of this to Gartner, Salesforce, and all the other big Agentic AI arms dealers.

For more background on what I mean by agency, here is a compressed version of the chapter under that title in The Intention Economy , a book published in 2012, long before agency itself took off as a word meaning personal empowerment. Here is a compressed version of that chapter:

Trust thyself: every heart vibrates to that iron string.—Ralph Waldo Emerson

As precious as life itself is our heritage of individual freedom, for man’s free agency is a God-given gift.—David O. McKay

The Argument

Agency is personal. It is the source of confidence behind all intention. By its nature the networked marketplace welcomes full agency for customers. So, because the best vendors are customer driven, there will be many more ways for both vendors and customers to thrive in the networked marketplace, and therefore also in the Intention Economy.

Originalities

When we use the word “agency” these days, we usually mean a party that acts on behalf of another one—such as an advertising, PR, real estate, talent or literary agency. But the deeper original meanings of agency are about acting for ones’ self.  Here are the Oxford English Dictionary’s relevant definitions of agent:

a. One who (or that which) acts or exerts power, as distinguished from the patient, and also from the instrument. He who operates in a particular direction, who produces an effect. Of things: The efficient cause. a. Of persons: One who does the actual work of anything, as distinguished from the instigator or employer; hence, one who acts for another, a deputy, steward, factor, substitute, representative, or emissary. (In this sense the word has numerous specific applications in Commerce, Politics, Law, etc., flowing directly from the general meaning.)

Here are the OED’s first three definitions of agency:

The faculty of an agent or of acting; active working or operation; action, activity. Working as a means to an end; instrumentality, intermediation. Action or instrumentality embodied or personified as concrete existence.[ii]

In the Intention Economy, liberated customers enjoy full agency for themselves, and employ agents who respect and apply the powers that customers grant them.

Work

Business in the industrial world is complicated. Nobody can do everything, and that’s one reason markets work. Opportunity appears where something can be done that others are not doing, or are not doing well enough. Many of those opportunities are representational in the sense that agency, in the form of work, is handed off. We hire agents to work as extensions of ourselves.

But agency is personal in the first place. Having agency makes us effective in the world, which includes the marketplace. This raises some interesting questions. What does it mean for a customer to have full agency in the marketplace? Is it just to show up with sufficient cash and credit? Is it enough to be known as a good customer only within the scope of a company’s CRM system? That’s the current default assumption, and it’s woefully limiting.

Take for example my agency as a customer in the airline business. Most years I fly more than a hundred thousand miles. I bring to the market a portfolio of knowledge, expertise and intent (that is, agency) that should be valuable to myself and valuable to the companies I might deal with. I know a lot about the science and history of aviation, about many airlines old and new, about many airports and their cities, about geography, geology, weather, astronomy and other relevant sciences.  I’m a photographer whose work is known within some aviation circles and to a small degree adds value to flying in general. I am also a fairly easy passenger to please. I require no assistance, have no dietary restrictions, show up early and don’t trouble airline personnel with rookie questions. I prefer certain seats but don’t freak out if I don’t get them, and I’m often one of the first to trade seats if it helps keep a couple or a family sit together on a plane. I am also willing to pay for certain privileges. Yet only the first item—miles flown—is of serious interest to the airline I usually fly, which is United. That I’m a million-mile flyer with United is unknown and uninteresting to all but that one airline.

Thus I have a measure of agency only within United’s system, and somewhat less than that with other members of the Star Alliance, to which United belongs. My self-actualization as a passenger is not my own, but that of a “1K” (100k mile/year) or whatever it says on my United Mileage Plus membership card in a given year. I am a high-value calf in their well-tended corrall. Its nice that my one-company status gets me some privileges with other airlines in the Star Alliance. But, since the IT systems of Star Alliance member airlines are not entirely communicative, those privileges are spotty. Asking any Star Alliance airline to be a cow for the calves of other airlines makes each of them groan.

The other airlines don’t know what they’re missing because they can’t know what they’re missing. All their heuristics are confined to their own CRM systems, plus whatever speculative “personalized” jive they buy from data mills. None of that milled data comes directly from you or me. If Delta buys data about me from, say, Acxiom, my agency is nowhere to be found. All the agency is Acxiom’s, and they’re not even acting as an agency for me in the representational sense of the word. I’ve offloaded no work on them at all, but they’re doing it on my behalf, sort of.

We can only do better if agency is ours and not theirs.

Self-actualization

To consider what self-actualization means in the marketplace, it helps to examine the business sections of bookstores and libraries. They are full of books about self-actualization for companies and their employees; but there are few if any books for customers in the business section. There is nothing, yet, about what it means for you and me to be self-actualized as customers. If there were, what would they say?

In A Theory of Human Motivation, Abraham Maslow placed “The need for self-actualization” at the top of the list of human motivations—above survival, safety, love and esteem. [v]

Being customers is part-time work for most of us. (Even for shopping addicts.) Yet we bring more to market than fits into the scope of any seller’s current systems for ignoring all but a small range of signals from customers. How much more can customers bring, and vendors embrace, if the range of signals and actions on the customer side are freed up?

We can answer that question with another one: How big do we want markets to be?

In Here Comes Everybody: The Power of Organizing Without Organizations, Clay Shirky examines the effects of social networking tools, a disruptive fact of marketplace life for which the business world reached maximum thrall in 2011. (And with good reason: Facebook alone boasted 750 million users.) “None of the absolute advantages of institutions like businesses or schools or governments have disappeared. Instead, what has happened is that most of the relative advantages of those institutions have disappeared—relative, that is to the direct effort of the people they represent.”

While Clay’s focus is on the social, the personal remains more than implicit. Each of us has far more agency in the networked market than we could possibly enjoy in the industrialized marketplace. Since the two are becoming one, our agency will become valuable to industry.

So, then

When you limit what customers can bring to markets, you limit what can happen in those markets.

By the way, all of this stuff will be on tables at the Computer History Museum next week, at VRM Day (Monday) and IIW (Tuesday to Thursday).

*Credit where due: Only Gemini gives links to its dialogs. (Correct me if I’m wrong about that. Here’s mine as far as I went with it (before going farther with Claude).


Jon Udell

What Claude and ChatGPT can see on your screen

“If you work with these cloud platforms every day, you have doubtless forgotten that you ever had questions like these. But every newcomer does. And on a continuing basis, we are all newcomers to various aspects of applications and services. In so many ways, the experience boils down to: I am here, what do I … Continue reading What Claude and ChatGPT can see on your screen

“If you work with these cloud platforms every day, you have doubtless forgotten that you ever had questions like these. But every newcomer does. And on a continuing basis, we are all newcomers to various aspects of applications and services. In so many ways, the experience boils down to: I am here, what do I do now?

It’s nice if you can share your screen with someone who has walked that path before you, but that’s often impossible or infeasible. LLMs synthesize what others have learned walking the path. We typically use words to search that body of hard-won knowledge. Searching with images can be a powerful complementary mode.”

What ChatGPT and Claude can see on your screen

Part of the LLM series at The New Stack.


@_Nat Zone

The Evolution of Digital Identity: OpenID’s Journey (SIDI Hub Tokyo Opening Speech)

(日本語機械翻訳。英語原文はこちら) 紳士淑女の皆様 デジタルアイデンティティの進化を巡るOpenIDコミュニティの旅に皆さんをご案内しましょう。この物語は今日もなお続いています。私は当初からこのコミュニティに携わっており、かなり確かな情報をお伝えできると思います。この話からは、…

(日本語機械翻訳。英語原文はこちら

紳士淑女の皆様

デジタルアイデンティティの進化を巡るOpenIDコミュニティの旅に皆さんをご案内しましょう。この物語は今日もなお続いています。私は当初からこのコミュニティに携わっており、かなり確かな情報をお伝えできると思います。この話からは、いくつかの教訓も得られることでしょう。

すべてはOpenID 1.0から始まりました。これは、ブログのオーナーがオンラインで身元を証明することを可能にする、シンプルかつ革新的なソリューションでした。その中核には自己主張型のIDシステムがあり、ブログコミュニティにおける履歴や交流を通じて信頼が構築されていました。

OpenID 2.0への移行は、大きな拡大のきっかけとなりました。ブログ中心のソリューションとして始まったものが、同様の取り組みを行う多様なコミュニティを引き付けました。OpenIDプロバイダーの導入により、Yahooのような主要なプラットフォームがエコシステムに加わりました。この動きは国際的な勢いを得て、日本ではOpenIDファウンデーション・ジャパンの設立が主要なテレビ局や雑誌で話題となりました。

コミュニティが急速に成長する一方で、私たちは課題にも直面しました。初期の非公式な性質から、知的財産権さえも適切に管理されていませんでした。この状況を解決するのに、私は4年を費やしました。

OpenID 2.0は成功を収めたものの、限界がありました。暗号の脆弱性と複雑性という問題に直面し、普及の妨げとなりました。XMLデジタル署名のような従来のソリューションも同様に問題がありました。そこで、まったく新しい署名フォーマットであるJWSとJWTを開発し、これがOpenID Connectの基盤となりました。

その影響は顕著でした。Googleが実装を主導し、その後、多数の企業が追随し、最終的にはAppleもこの動きに加わりました。スマートフォンの革命により、OpenID Connectは接続された世界全体に広がり、民間部門のアプリケーションと政府サービスの両方に浸透しました。

しかし、これで終わりでしょうか? そうではありません。

私たちは3つの重大な課題に直面しています。

世界の人口の半分しかスマートフォンを利用できない 多くの政府は、市民に法的デジタルIDを提供するインフラを依然として欠いています。 従来のIDプロバイダーは、私たちに関する情報の限られたサブセットのみを検証でき、スケーラブルな24時間365日稼働のシステムを提供する必要がある発行者には大きな負担がかかります。

日本の高度なデジタルIDインフラは、確かに素晴らしいものですが、世界的に見ると例外的な存在です。発行者が時として接続できる、より多用途なソリューションが必要です。

今後の進むべき道筋には、より強固で協力的なコミュニティが必要です。幸いにも日本には、組織の枠を超えたコミュニティがあります。今年8月には、FIDO、W3C、OpenIDファウンデーション・ジャパンが合同でイベントを開催し、会場のキャパシティを超える200名以上の方々にご参加いただきました。

SIDI Hub Tokyoは、この取り組みにおける新たな一歩です。より幅広いコミュニティが集まっています。デジタル庁が私たちを結びつけ、このような素晴らしい施設を提供してくれたことに感謝いたします。有意義な議論ができることを楽しみにしています。

ありがとうございます。

Slides: https://gamma.app/docs/The-Evolution-of-Digital-Identity-OpenIDs-Journey-sb1lbqdx3ozjhg1

Thursday, 24. October 2024

Heres Tom with the Weather

CPJ Head Condemns Israel's Deadly War on Journalists

Jodie Ginsberg, CPJ’s chief executive officer: No journalist from outside Gaza has been allowed in since the start of that war, and that’s highly unusual. I speak to lots of war correspondents who’s covered many, many wars over decades, and all of them talk about how unprecedented this is to not have any access whatsoever. And that, of course, puts additional pressure on these journalists

Jodie Ginsberg, CPJ’s chief executive officer:

No journalist from outside Gaza has been allowed in since the start of that war, and that’s highly unusual. I speak to lots of war correspondents who’s covered many, many wars over decades, and all of them talk about how unprecedented this is to not have any access whatsoever. And that, of course, puts additional pressure on these journalists.

CPJ Head Condemns Israel’s Deadly War on Journalists in Gaza as IDF Threatens Al Jazeera Reporters

In the interview, Amy Goodman also mentions “James McGovern leading 64 other congressmembers in a letter to Biden and Blinken, urging them to push for Israel to allow in international journalists”


The Pragmatic Engineer

The Pulse #112: Similarities between AI bots using a computer and end-to-end testing

Also: Automated reasoning proves system correctness at AWS, Winamp code shows why software licenses are important, and more

The Pulse is a series covering insights, patterns, and trends within Big Tech and startups. Notice an interesting event or trend? Send me a message.

Today, we cover:

Industry pulse. Year-on-year growth of software engineering indicates there’s a “Great Stay”, “agents” is the new AI buzzword, Stripe buys stablecoin company for $1.1B, Apple’s Vision Pro struggles, and more.

Similarities between AI bots using a computer and end-to-end testing. Anthropic has released an AI bot that operates a computer like a human. Looking closer, this technology’s limits seem the same as those of end-to-end testing. In fact, some end-to-end testing startups using AI for this task could be ahead of Anthropic.

Automated reasoning at AWS. Automated reasoning is a formal method to validate the correctness of systems using mathematical proofs. AWS is using it a lot, including to harden its S3 file storage service. LLMs occupy most attention in innovative software engineering approaches, and automated reasoning seems like a very significant field. Unlike non-deterministic LLMs, automated reasoning is fully deterministic!

Winamp code shows why software licenses are important. Winamp released its source code on GitHub. In a cheeky move, its team added an absurd clause that means nobody can use those additions. Developers then discovered that Winamp appeared to be breaking GPL licenses for open source code and ignoring legal requirements.

1. Industry pulse Software engineering’s year-on-year growth

In Tuesday’s article about the state of the software engineering market, it mentions 2023 was the only year since 2000 when the number of employed software engineers dropped, according to data from Live Data Technologies. Here’s an alternative visualization, which shows the growth percentage in the group tracked:

Employed software engineers each year since the year 2000.

As context, Live Data Technologies tracks 775,000 software engineer profiles. Every year, their data set has more files added.

The Great Stay?

Read more


Werdmüller on Medium

The web and I

I grew up on the web. The web grew up with me. Continue reading on Medium »

I grew up on the web. The web grew up with me.

Continue reading on Medium »

Wednesday, 23. October 2024

The Pragmatic Engineer

Twisting the rules of building software: Bending Spoons (the team behind Evernote)

You may not be familiar with Bending Spoons, but I guarantee you’ve encountered some of their well-known products, like Evernote and Meetup. In today’s episode of The Pragmatic Engineer, we sit down with three key figures from the Italy-based startup

Listen now on Apple, Spotify, and YouTube.

Brought to you by:

The Enterprise Ready Conference on October 30th — For B2B leaders building enterprise SaaS.
DX — DX is an engineering intelligence platform designed by leading researchers. 
ByteByteGo — Ace your next system design interview.

You may not be familiar with Bending Spoons, but I guarantee you’ve encountered some of their well-known products, like Evernote and Meetup. In today’s episode of The Pragmatic Engineer, we sit down with three key figures from the Italy-based startup: cofounder and CEO Luca Ferrari, CTO Francesco Mancone, and Evernote product lead Federico Simionato. Bending Spoons has been profitable from day one, and there's plenty we can learn from their unique culture, organizational structure, engineering processes, and hiring practices. In today’s conversation, we discuss:

The controversial acquisitions approach of Bending Spoons

How Bending Spoons spent more than $1 billion in buying tech companies

How the Evernote acquisition happened

How Bending Spoons operates and how it organizes product and platform teams

Why engineering processes are different across different products

How ‘radical simplicity’ is baked into everything from engineering processes to pay structure.

And much more!

Takeaways

1. Even inside one company, you choose engineering processes based on the maturity of the product. The CTO of Bending Spoons found it completely normal that each team decides on their approach to testing: e.g. more mature products have a lot more automated tests like unit, integration, UI tests in place. New products or less mature ones will still have less. The same goes for releasing and experimentation – e.g., more mature products will have more stages of release and experimentation, but products that are still just being built will not necessarily invest in this.

2. The concept of radical simplicity: this could be applicable far beyond Bending Spoons.

Bending Spoons believes, as a principle, that they should seek out the most radically simple solution and approach and. When adding complexity, the person or team approaching should bring proof why this complexity is beneficial. Those who retain the simpler status should not have to defend this, unless there is evidence and data that adding more complexity truly helps.

3. You don’t need to copy popular approaches to succeed as a product or engineering team. Bending Spoons seems to have devised a way that makes sense for them to operate, and they did not “copy” common approaches from other companies. A few examples:

Their most popular language is Python. This is a relatively rare choice for most companies, but not for them! At the same time, teams can choose technologies they use: and there are teams onboarding to other languages like Rust.

They do not have career ladders like most companies would do – at least for now. No bonuses either.

The concept of radical simplicity.

In some ways, they didn’t follow any approach because they didn’t really get much advice in the early years (they struggled to even attract VCs!) So they figured it out on their own.

If a small company in Italy with five devs could do this and keep figuring out what works for them as they grow: what is stopping you and your team from doing so?

Timestamps

(2:09) Welcome, Luca, Francesco, and Federico from Bending Spoons

(03:15) An overview of the well-known apps and products owned by Bending Spoons

(06:38) The elephant in the room: how Bending Spoons really acquires companies

(09:46) Layoffs: Bending Spoons’ philosophy on this

(14:10) Controversial principles

(17:16) Revenue, team size, and products

(19:35) How Bending Spoons runs AI products and allocates GPUs

(23:05) History of the company

(27:04) The Evernote acquisition

(29:50) Modernizing Evernote’s infrastructure

(32:44) “Radical simplicity” and why they try for zero on calls

(36:13) More on changes made to the Evernote systems

(41:13) How Bending Spoons prioritizes and ships fast 

(49:40) What’s new and what’s coming for Bending Spoons

(51:08) Organizational structure at the company

(54:07) Engineering practices

(57:03) Testing approaches

(58:53) Platform teams

(1:01:52) Bending Spoons tech stack and popular frameworks

(1:05:55) Why Bending Spoons hires new grads and less experienced engineers

(1:08:09) The structure of careers and titles at Bending Spoons

(1:09:50) Traits they look for when hiring 

(1:12:50) Why there aren’t many companies doing what Bending Spoons does

References

The Pragmatic Engineer deepdives relevant for this episode:

Good attrition, bad attrition for software engineers

Healthy oncall practices

Shipping to production

QA across the tech industry:

Where to find Luca Ferrari:

• X: https://x.com/luke10ferrari

• LinkedIn: https://www.linkedin.com/in/luca-ferrari-12418318

Where to find  Francesco Mancone:

• LinkedIn: https://www.linkedin.com/in/francesco-mancone

Where to find Federico Simionato:

• X: https://x.com/fedesimio

• LinkedIn: https://www.linkedin.com/in/federicosimionato

Mentions during the episode:

• Evernote: https://evernote.com/

• Meetup: https://www.meetup.com/

• Bending Spoons: https://bendingspoons.com/

• Bending Spoons Acquires Mosaic Group Digital Assets From IAC: https://www.businesswire.com/news/home/20240110239723/en/Bending-Spoons-Acquires-Mosaic-Group-Digital-Assets-From-IAC

• StreamYard: https://www.businesswire.com/news/home/20240110239723/en/Bending-Spoons-Acquires-Mosaic-Group-Digital-Assets-From-IAC

• Issuu: https://issuu.com/

• WeTransfer: https://wetransfer.com/

• Remini: https://remini.ai/

• Netflix’s Famous 'Keeper Test': https://www.businessinsider.com/netflix-new-culture-memo-keeper-test-changes-2024-6

• Bending Spoons Values: https://bendingspoons.com/values

• Splice: https://splice.com/

• GoPro: https://gopro.com/

• Thirty more exciting improvements in Evernote: https://evernote.com/blog/30-improvements-april-june-2024

• Rust: https://www.rust-lang.org/

• Jobs at Bending Spoons: https://jobs.bendingspoons.com/

• Evernote on YouTube: https://www.youtube.com/@evernote

Production and marketing by https://penname.co/. For inquiries about sponsoring the podcast, email podcast@pragmaticengineer.com.

Tuesday, 22. October 2024

The Pragmatic Engineer

State of the software engineering job market in 2024

A deep dive into job market trends, the companies and cities hiring the most software engineers, growth areas, and more. Exclusive data and charts

How’s the market for software engineers and engineering managers doing? During the past year, this publication took the pulse of tech recruitment pretty regularly:

Amazon reducing number of engineering managers a few weeks ago

Surprise uptick in software engineering recruitment in summer

End of zero percent interest rates and what it means for software engineers and managers

… and regularly touch on market changes and trends in weekly The Pulse issues.

Even so, nothing beats high-quality data for insights. To get some, The Pragmatic Engineer and two companies whose bread-and-butter is tracking employment stats and jobs data, have collaborated, and both businesses were kind enough to create the resulting unique data sets and charts for this article. They are:

Live Data Technologies: a startup based in California, which has developed a method of prompt-engineering major search engines to capture near-real-time data on employment changes in the US. They leverage this publicly available information to monitor 1M+ job changes, and 300M+ employment validations monthly, using a proprietary process. They can track hiring trends across companies, roles, functions, levels, industries, and locations. 

The business normally sells this data to investors like VCs, private equity, quant funds, etc, and also to platforms that incorporate people data, such as sales tech, CRMs, talent platforms, private wealth, HR teams, and go-to-market leaders. It also serves as a data resource for well-known media outlets, including the Wall Street Journal, The Economist, The Information, Bloomberg — and today also for The Pragmatic Engineer! If you have questions about their data, please do ping !

TrueUp: a tech career platform that scans open jobs at Big Tech companies and “top” startups — doing this in real time. The company monitors 55,000+ open software engineering jobs across regions such as the US, Europe, India, and other major markets. Their platform processes close to 220,000 changes in listings per day from all tech jobs, and produces interesting analysis like hottest companies by investment, and sector reports. Because TrueUp only focuses on Big Tech and “top” startups, they will exclude many other tech jobs: and they thus typically focus on the top two tiers within the trimodal split of software engineering jobs.

Today, we cover:

How has the software engineering discipline grown / shrunk?

Which companies have the most openings?

Which companies are growing or shrinking? 

Where do top companies hire software engineers from?

Which engineering disciplines are growing fastest?

Interesting details about seniority and tenure

How easy is it to find a new job?

How common are remote jobs?

Which cities have the most engineering job opportunities?

Which companies are hiring much less or much more than before?

What are the most sought-after technologies?

Related deepdives on the tech market from the past years:

Surprise uptick in software engineering recruitment (2024)

The end of 0% interest rates: what the new normal means for software engineers (2024)

The tech market, as seen by hiring managers (2023)

State of the tech market in 2023

The “perfect storm” causing an insane tech hiring market in 2021

1. How has the software engineering discipline grown / shrunk?

It feels that 2023 was a bad year for software engineering due to unprecedented large layoffs. Zooming further out from 2000, what are the long-term job trends? Here's the data:

An interactive visualization:

2023 was, indeed, historic as the first year since Y2K when the number of software engineering jobs shrunk. Growth has resumed this year, but it’s at least possible that a long-term flattening of the growth curve is emerging.

Let’s take a closer look at how the industry grew after 2001’s Dotcom Bust, and during the 2008-2021 zero interest rate period:

This visualization puts a few things into perspective:

In the 4 years after the 2001 Dotcom Bust, the tech industry grew relatively slowly

A decade of growth happened between 2011 and 2021 for combined reasons like the mobile and cloud revolutions, and massive VC investment during a zero interest rate period. Check out analysis of this in a deep dive on what 0% interest rates and their demise mean for tech

Since 2022, growth has slowed to levels not seen since 2001-2004

Key moments in smartphones and cloud computing, and for interest rates. Did they help create the strong growth of 2011-2021? Source: The end of 0% interest rates

The future does not mirror the past, of course, and right now there's a cyclical slowdown in VC investment, coupled with huge AI investment. But elsewhere, there’s currently no similar triggers for growth like those of the smartphone and cloud revolutions from 2007, through the 2010s. 

2. Which companies have the most openings?

Here’s how software engineering job openings have changed over the past two years, globally:

The good news is there’s been an uptick since the start of this year. The bad news is that the number of vacancies is still around half of what it was in 2022. This data matches a deep dive into global software engineering vacancies falling back to 2019 levels.

Companies hiring the most

If you’re looking for openings, here are the top 10 tech companies recruiting the most new software engineers:

This list has some surprises:

Amazon is unsurprisingly on top since announcing its full return to the office, which caused massive churn and the need to backfill

Oracle is a big surprise, and this hiring spike is covered in depth, below

TikTok and ByteDance’s aggressive hiring with 6x as many roles as rival Meta, is a surprise. This company is the only Asian business in the list, and shows how much the China-headquartered firm is investing. This is especially interesting given there are calls for the app to be banned in the US, or for it to be forced into a sale.

IBM hiring more than Google, Microsoft, NVIDIA, and Meta, is also a major surprise, and suggests the 114-year-old computing giant is doing pretty well.

SAP is the only European-headquartered company on the list and is the leader in Enterprise resource planning (ERP) software. SAP hiring enough to make it to the top 10 suggests their business outlook is also positive.

Other notable tech companies outside the top 10:

Senior-and-above engineering openings

Oracle hiring the most senior-above-folks by a wide margin is unexpected. Uber, PayPal, and Databricks, have a higher ratio of senior-and-above positions open than most other companies, and in larger numbers.

One important caveat to add here is Oracle’s “senior inflation” in titles. The Senior Member of Technical Staff (Senior MTS) at Oracle is the equivalent of L4 / Software Engineer 2 / mid-level software engineer at other companies. This caveat applies throughout all other senior-and-above categorizations at Oracle.

Engineering leadership openings

If you’re an engineering leader, the companies with the most leadership positions open are below. Again, the clear leader is Oracle.

Other places with open engineering leadership positions, in tech lead and engineering management:

Why is Oracle hiring heavily?

So many openings at Oracle in senior-and-above positions, is quite surprising. What explains this is that in the data set, “principal engineer” is considered a leadership position. However, at Oracle, the scope of a principal engineer resembles a senior engineer elsewhere in Big Tech.

Oracle seems to be doing very well, business-wise. The company’s market cap is an impressive $483B at time of publishing, making it the #11 biggest publicly traded tech company, just below Tesla ($693B), and Tencent ($508B), and above Netflix ($325B) and Samsung ($285B). 

Here’s its market cap change over the last two years:

Oracle’s market cap has nearly tripled in two years

What’s behind this strong performance? Annual revenue climbed from $42B to $52B in two years, it remains strongly profitable at $8-10B per year, and forecasts to hit $100B revenue by 2029. Remember, OpenAI is projecting that it will also hit $100B revenue in 2029 – something I think is impossible in the time frame, given the company is expected to generate “only” $4B revenue in 2024.

So, where does Oracle hire from? Using the Moneyball tool from Live Data Technologies, we can see recruitment in the last year:

Where Oracle is hiring from, and where people at Oracle move to

Is Oracle hiring so much to backfill for attrition? Based on the data, the answer is “no.” Far more people joined Oracle last year than left.

Oracle acquired Cerner corporation in 2021, so that movement doesn’t really count. The biggest sources of hires are Amazon, Microsoft, Accenture, IBM, and Cognizant. Last year, many fewer people left Oracle than were hired. Among leavers, the leading destinations were Salesforce, Microsoft, IBM, NetSuite, and Workday.

3. Which companies are growing and shrinking? 

Moneyball is a neat product from Live Data Technologies that’s open for anyone to use after registration. With this tool, up to 10 companies can be compared on headcount data. So, let’s take a look!

Which companies are growing their headcounts, and which have shrunk theirs significantly in the last two years?

Big Tech

Meta, Amazon, Google and Microsoft all did large cuts between the end of 2022 and mid-2023, meaning that:

Amazon and Meta have shrunk in the past two years

Microsoft is around the same size

Google has grown by around 7%

Apple is the only Big Tech company not to do layoffs, and grew in headcount by about 5% in that time

Apart from Google, no Big Tech company has shown notable growth for 6 months

Faster-growing companies

Which companies have grown headcounts rapidly in the last two years? The standouts:

NVIDIA is benefitting hugely from the AI industry’s rocketing growth, and briefly became the world’s most valuable company.

Tesla had impressive headcount growth until early 2024, after which the EV maker made job cuts.

Workday growing by 25% in two years is a bit puzzling, and seems to suggest the HR SaaS suite is doing very well. Workday is doing fantastic financially: revenue grew last year from $6B to $7B, and the profit (net income) went from slightly negative, to $1.3B. Headcount growth and positive financials suggest Workday is doing very well when few similar companies can say the same.

Airbnb seems to have been on a hiring spree throughout 2022, and headcount has been flat since.

Shrinking workplaces

Which larger tech companies have seen the most headcount reductions?

Twilio had several layoffs: 11% in September 2022, 17% in February 2023, and 5% in December 2023. Today, the headcount is 27% lower than two years ago. At the same time, it was one of the first to commit to full-remote work in a powerfully-worded statement two weeks ago.

Spotify did deep job cuts from the middle of 2022, and more in late 2023, which reduced the workforce by around 25%.

Shopify had a very deep cut early 2023, and has kept headcount flat since.

Salesforce did layoffs in early 2023 and hasn’t been hiring since. 

DoorDash has done no mass layoffs and seems to be letting natural attrition of about 10% per year do its job by not backfilling. Check out a deep dive into typical attrition for software engineers.

4. Where do top companies hire software engineers from? Hottest AI companies

OpenAI, Anthropic, xAI, and Mistral, are the hottest AI companies. They each raised billions in funding, are growing at what feels like exponential pace, and all could be “rocket ships” in terms of company and career trajectories. From which places do these leaders hire from? The data:

Most common sources of hires by company:

OpenAI: Google, Meta, Stripe, Rockset, Microsoft

Anthropic: Google, Stripe, Meta, Cruise, Slack

xAI: Tesla, Google, Scale AI, OpenAI

Mistral: Google, Meta, Amazon, NVIDIA, Chronosphere

Google is the biggest source of all hires by these companies except for xAI, where more Tesla employees join. Elon Musk is CEO of Tesla and founder of xAI.

It’s interesting that OpenAI is already a sizable-enough recruiter to show up in the data, at xAI. Also, the Chronosphere-to-Mistral transition is interesting, and we cover it in The story of Chronosphere.

Also check out articles on OpenAI’s engineering culture, and how Anthropic builds products.

Big Tech

Last week, we looked into why techies leave Big Tech; but from where do software engineers join the largest tech companies?

Companies hired from, by popularity:

Amazon: Microsoft, Google, Tesla, Accenture, Boeing

Meta: Microsoft, Amazon, Google, TikTok

Google: Microsoft, Amazon, Qualcomm, Apple, Intel

Microsoft: Amazon, Intel, Google, VMWare

Apple: Google, Qualcomm, Intel, Amazon, Microsoft

There are few surprises:

Intel and Qualcomm are popular sources of Big Tech hires. Both companies have high engineering bars. Intel is struggling business-wise, so it’s little surprise people are leaving. Qualcomm seems to be doing better; my sense is that higher compensation packages might be partly why some of its staff move to companies like Google and Apple.

Where’s Meta? I assumed the social media giant would be a major source of hires for companies like Google, but it did not make the top 5 places from where Big Tech companies hire. This that Big Tech doesn’t need to hire from Meta — they find enough strong candidates from other companies!

Amazon and Microsoft hire from each other a lot. The #1 company Amazon hires from is Microsoft, and vice-versa. A likely reason is that both are headquartered in Seattle, pay similar compensation, and their offices are in close proximity.

5. Which engineering disciplines are growing fastest?

Read more

Monday, 21. October 2024

@_Nat Zone

デジタルアイデンティティ:世界の最新動向 –10月22日版

グローバル市場動向 iProov の取引量が前年比63%増加 生体認証IDベンダーのiProovは、昨年同期と比較して取引量が63%増加したことを発表しました。同社によると、この成長はサービスのデジタル化の急速な進展と、特に生成AIを活用したディープフェイクなどの高度なサイバー攻…
グローバル市場動向 iProov の取引量が前年比63%増加

生体認証IDベンダーのiProovは、昨年同期と比較して取引量が63%増加したことを発表しました。同社によると、この成長はサービスのデジタル化の急速な進展と、特に生成AIを活用したディープフェイクなどの高度なサイバー攻撕の脅威の増大によるものです。iProovは最近、Avoco Secure、Guardline、TrustCloudとのパートナーシップを新たに確立し、エコシステムを拡大しています。

Oktaのパートナーファースト成長戦略

Oktaは、パートナーファーストの成長戦略を採用し、以下の施策を展開します:

パートナー向けセルフサービスリソースとツールの中央ハブ設置 オンデマンド教育、タイムリーな情報提供 セルフサービスマーケティング自動化 プロジェクト提出機能の提供 顧客へのパートナーソリューション紹介機能

また、OpenID Foundationにおいて、Microsoft、Google、Ping Identityなどと共同で、AIエージェントの統合によりさらにリスクが高まるSaaSアプリのセキュリティ強化に向けた新たなイニシアチブ IPSE WGを立ち上げました。

市場予測

SkyQuestの分析によると、グローバルID認証市場は2031年までに397.3億ドルに達し、予測期間中のCAGRは16.7%となる見込みです。この成長は以下の要因に支えられています:

ID関連の不正やサイバー犯罪の増加 データセキュリティの使用事例の拡大 組織や政府のデジタル化政策 自動化、機械学習、AI技術の導入 各国の最新動向 アンギラ

アンギラ政府(GOA)とIdentity Digitalは、.AIドメインからの政府収入増加に向けて協力することを発表しました。AI技術のブームと.AIドメインの利用拡大により、2023年には政府の総収入の20%を占めるまでになっています。両者は「質の高いドメインを優先し、.AIドメイン名への信頼を長期的に確立する世界クラスのレジストリ管理プログラム」の構築を目指しています。

ベトナム

ベトナムでは、2025年までに6,000万人の納税者が電子身分証明書を使用してモバイル税務アプリケーション「eTax Mobile」を利用できるようになります。税務総局は、デジタルIDを税務管理システム全体に展開する計画の一環として、税コードを国民データベースに統合します。電子認証データは国民向けモバイルアプリケーション「VneID」で管理され、Identity Weekによると、来年までに全納税者の電子IDと税務情報のリンクが完了する予定です。

欧州

Mastercardは、オンラインでの本人確認をより安全かつ効率的にするため、Identity Attribute Verification(本人属性確認)サービスを欧州各国に展開します3。このサービスにより、企業や金融機関は、顧客に追加書類の提出を求めることなく、既存の支払いカードデータを使用して年齢、生年月日、住所などの情報を確認できるようになります。

アメリカ

マネージドアイデンティティソリューションプロバイダー(MISP)のBeyondIDは、顧客向けの不正防止ソリューションを発表しました。これは同社のマネージドアイデンティティソリューションの一部として提供され、「セキュアなトータルエクスペリエンス」の実現を支援します。

パナマ

パナマ政府は、市民と永住者向けのデジタルIDの導入を決定しました。このデジタルIDは:

選挙裁判所が発行する物理的文書の情報を電子化 従来の物理的な身分証明書と同等の法的効力を持つ すべての公的機関と民間企業のオンラインサービスで使用可能 デジタルID、医療履歴、履歴書などの個人文書を安全に保管 顔認証や生体認証によるセキュリティ確保 スマートフォンアプリを通じて管理

なお、このデジタルIDは任意であり、物理的な身分証明書の代替ではありません。

キルギス

キルギスは、デジタルIDを国家開発計画の主要コンポーネントと位置付けています。2024-2028年のデジタル変革ビジョンでは、エストニアのX-Roadシステムをモデルとしたトゥンドゥクプラットフォームを通じて公共サービスを統合します。さらに、ウズベキスタンやカザフスタンとの国境を越えたデジタルID連携も検討中です。MühlbauerやEmptechなどのグローバル企業が、生体認証情報とデジタルIDの提供を担当しています。

(出所)https://www.thinkdigitalpartners.com/news/2024/10/21/digital-identity-global-roundup-188/


Michael Ruminer

Using LM Studio and CrewAI with Llama

I was recently experimenting with CrewAI, but in one experiment I wanted to use a local LLM hosted by LM Studio. Why LMStudio? Well… because that was the experiment. I wanted to see how I’d use LMStudio to host my model versus Ollama and then use it from CrewAI. Below is a very simple setup for doing this. It’s good to point out that CrewAI uses LiteLLM under the hood as a sort of proxy. Thi

I was recently experimenting with CrewAI, but in one experiment I wanted to use a local LLM hosted by LM Studio. Why LMStudio? Well… because that was the experiment. I wanted to see how I’d use LMStudio to host my model versus Ollama and then use it from CrewAI. Below is a very simple setup for doing this.

It’s good to point out that CrewAI uses LiteLLM under the hood as a sort of proxy. This provides CrewAI with the ability to talk to a range of LLMs without needing to really do much in terms of handling it.

What Doesn’t Work and Why

The first thing I found is what doesn’t work. The LLM class in CrewAI allows for the instantiation of an LLM that can then be used by agents. Spinning up an LLM instance on a local Ollama hosted model can look like below.

ollama_31_8b = LLM(
model="ollama/llama3.1",
base_url="http://localhost:11434"
)

This works just fine if hosting the LLM inside Ollama, but you get response exceptions if, instead, you try to host inside LMStudio at the same server port.

ERROR: LiteLLM call failed: litellm.APIConnectionError: 'response'

First, you notice that the base_url doesn’t have a “/v1” at the end which LMStudio uses in their server setup. If you fix that, thinking it might work, you’ll find that you likely get the same error.

Secondly, you may realize that the model property in your LLM instantiation uses a [PROVIDER]/[MODEL] format. I tried removing the provider portion to see what would happen. The results were:

llm.py-llm:161 - ERROR: LiteLLM call failed: litellm.BadRequestError: LLM Provider NOT provided.

That’s a reasonable outcome.

What Does Work

Lastly, I remembered that LM Studio is using OpenAI endpoints.

A quick look at the LiteLLM docs provided the answer I needed; set the provider as “openai”. This results in a final outcome of:

ollama_31_8b = LLM(model="openai/llama3.1", base_url="http://localhost:11434/v1")

Now, if you try running it with the agent using the LLM instantiated it will work. Below is example code of the LLM and agent creation where I had Llama 3.1:8b model hosted in LM Studio on port 11434.

@CrewBase
class MyCrew():

llama_31_8b = LLM(
model="openai/llama3.1",
base_url="http://localhost:11434/v1"
)

@agent
def joke_generator(self) -> Agent:
return Agent(
config=self.agents_config['joke_generator'],
verbose=True,
llm=self.llama_31_8b
) Note

Note that on LMStudio I had my server port set to 11434 versus the default of 1234. It made it easier as I switched back and forth between Ollama and LM Studio; I didn’t need to modify the port. 11434 is the default Ollama port.

When Might I Use This

When might I use this? If I am programming, probably rarely. I could instead host the model in Ollama. I’d use LM Studio if I want to host a model and chat with it. In that scenario, I’d probably be more likely to use Ollama with AnythingLLM which would also provide me with some Retrieval-Augmented Generation (RAG) capabilities. Nonetheless, it was an experiment and I proved, for myself, it could easily be done.


Damien Bod

Implement security headers for an ASP.NET Core API supporting OpenAPI Swagger UI

This article shows how to implement security headers for an application supporting an API and a swagger UI created from a open API in .NET 9. The security headers are implemented using the NetEscapades.AspNetCore.SecurityHeaders Nuget packages from Andrew Lock. Code: https://github.com/damienbod/WebApiOpenApi Deploying a web application which supports both an API and a UI have different … … Continu

This article shows how to implement security headers for an application supporting an API and a swagger UI created from a open API in .NET 9. The security headers are implemented using the NetEscapades.AspNetCore.SecurityHeaders Nuget packages from Andrew Lock.

Code: https://github.com/damienbod/WebApiOpenApi

Deploying a web application which supports both an API and a UI have different levels of security requirements. The Swagger UI is created from an Open API specification and uses inline Javascript with no hashes or nonces which requires weak security definitions. The API has no UI and can use the maximum security header definitions. It can be locked down as much as possible against the typical web UI attacks.

The API endpoints can be secured using a definition with strict security headers and a lot of browser features locked down.

public static class SecurityHeadersDefinitionsAPI { private static HeaderPolicyCollection? policy; public static HeaderPolicyCollection GetHeaderPolicyCollection(bool isDev) { // Avoid building a new HeaderPolicyCollection on every request for performance reasons. // Where possible, cache and reuse HeaderPolicyCollection instances. if (policy != null) return policy; policy = new HeaderPolicyCollection() .AddFrameOptionsDeny() .AddContentTypeOptionsNoSniff() .AddReferrerPolicyStrictOriginWhenCrossOrigin() .AddCrossOriginOpenerPolicy(builder => builder.SameOrigin()) .AddCrossOriginEmbedderPolicy(builder => builder.RequireCorp()) .AddCrossOriginResourcePolicy(builder => builder.SameOrigin()) .RemoveServerHeader() .AddPermissionsPolicyWithDefaultSecureDirectives(); policy.AddContentSecurityPolicy(builder => { builder.AddObjectSrc().None(); builder.AddBlockAllMixedContent(); builder.AddImgSrc().None(); builder.AddFormAction().None(); builder.AddFontSrc().None(); builder.AddStyleSrc().None(); builder.AddScriptSrc().None(); builder.AddBaseUri().Self(); builder.AddFrameAncestors().None(); builder.AddCustomDirective("require-trusted-types-for", "'script'"); }); if (!isDev) { // maxage = one year in seconds policy.AddStrictTransportSecurityMaxAgeIncludeSubDomains(maxAgeInSeconds: 60 * 60 * 24 * 365); } return policy; } }

The API calls would return the following headers in the HTTP response. The CSP blocks Javascript.

The Swagger definition allows unsafe Javascript. This allows for XSS attacks and is a weak level of security. This is required due to the way the Swagger UI is created.

public static class SecurityHeadersDefinitionsSwagger { private static HeaderPolicyCollection? policy; public static HeaderPolicyCollection GetHeaderPolicyCollection(bool isDev) { // Avoid building a new HeaderPolicyCollection on every request for performance reasons. // Where possible, cache and reuse HeaderPolicyCollection instances. if (policy != null) return policy; policy = new HeaderPolicyCollection() .AddFrameOptionsDeny() .AddContentTypeOptionsNoSniff() .AddReferrerPolicyStrictOriginWhenCrossOrigin() .AddCrossOriginOpenerPolicy(builder => builder.SameOrigin()) .AddCrossOriginEmbedderPolicy(builder => builder.RequireCorp()) .AddCrossOriginResourcePolicy(builder => builder.SameOrigin()) .RemoveServerHeader() .AddPermissionsPolicyWithDefaultSecureDirectives(); policy.AddContentSecurityPolicy(builder => { builder.AddObjectSrc().None(); builder.AddBlockAllMixedContent(); builder.AddImgSrc().Self().From("data:"); builder.AddFormAction().Self(); builder.AddFontSrc().Self(); builder.AddStyleSrc().Self().UnsafeInline(); builder.AddScriptSrc().Self().UnsafeInline(); //.WithNonce(); builder.AddBaseUri().Self(); builder.AddFrameAncestors().None(); }); if (!isDev) { // maxage = one year in seconds policy.AddStrictTransportSecurityMaxAgeIncludeSubDomains(maxAgeInSeconds: 60 * 60 * 24 * 365); } return policy; } }

The HTTP response would look something like this:

The security headers are setup to always use the API swagger definitions except for the swagger routes. This is used in development. I still don’t deploy this to production, unless the Swagger UI is absolutely required. In production, all responses use the stricter security header definitions.

// Open up security restrictions to allow this to work // Not recommended in production var deploySwaggerUI = builder.Configuration.GetValue<bool>("DeploySwaggerUI"); var isDev = builder.Environment.IsDevelopment(); builder.Services.AddSecurityHeaderPolicies() .SetPolicySelector((PolicySelectorContext ctx) => { // sum is weak security headers due to Swagger UI deployment // should only use in development if (deploySwaggerUI) { // Weakened security headers for Swagger UI if (ctx.HttpContext.Request.Path.StartsWithSegments("/swagger")) { return SecurityHeadersDefinitionsSwagger.GetHeaderPolicyCollection(isDev); } // Strict security headers return SecurityHeadersDefinitionsAPI.GetHeaderPolicyCollection(isDev); } // Strict security headers for production else { return SecurityHeadersDefinitionsAPI.GetHeaderPolicyCollection(isDev); } });

The security headers are added as middleware using the UseSecurityHeaders methods.

app.UseSecurityHeaders();

Notes

This setup works good and the correct headers for the API are used in both development with the Swagger UI or without the Swagger UI. No weaken headers are deployed to production.

Links
Improving application security in an ASP.NET Core API using HTTP headers – Part 3

https://securityheaders.com/

https://csp-evaluator.withgoogle.com/

Security by Default Chrome developers

A Simple Guide to COOP, COEP, CORP, and CORS

https://github.com/andrewlock/NetEscapades.AspNetCore.SecurityHeaders

https://scotthelme.co.uk/coop-and-coep/

https://github.com/OWASP/ASVS


Doc Searls Weblog

Comet, second try

I shot the comet this time with a real camera: my Sony a7iv with a FE 70-200 mm F2.8 GM OSS II lens set at f3.5 at 135mm for 10 seconds on a shitty tripod I got at a thrift shop for $5. (I have good ones elsewhere.) This was at 8:40pm, just as the […]
Comet Tsuchinshan–ATLAS, flying toward its tail, away from the Sun.

I shot the comet this time with a real camera: my Sony a7iv with a FE 70-200 mm F2.8 GM OSS II lens set at f3.5 at 135mm for 10 seconds on a shitty tripod I got at a thrift shop for $5. (I have good ones elsewhere.) This was at 8:40pm, just as the moon was rising behind my back and before dusk had turned to night with plenty of unwanted light from street lamps and the other usual challenges.

I couldn’t see it, but my iPhone 16 Pro Max did:

Using other shots I maneuvered myself to a place where the camera was in shadow from street lights and I could put the camera and tripod on the sidewalk while locating the comet to the right of that evergreen tree, so it was easy to aim in the general direction and get the shot.

If it’s clear again tomorrow night, I’ll go find a darker place at a later time and see what I can do.

Friday, 18. October 2024

Doc Searls Weblog

A Comet Hunt

Tonight was the first completely clear sky in a while, almost perfect for hunting Comet Tsuchinshan-ATLAS, which for a few more nights will be gracing our evening sky. With a full moon high in the eastern sky, and plenty of light pollution from the town around me, the comet was hard to see. Fortunately, the […]
Comet Tsuchinshan-ATLAS, more visible to the camera than the eye.

Tonight was the first completely clear sky in a while, almost perfect for hunting Comet Tsuchinshan-ATLAS, which for a few more nights will be gracing our evening sky.

With a full moon high in the eastern sky, and plenty of light pollution from the town around me, the comet was hard to see. Fortunately, the camera in my new iPhone 16 Pro Max sees better than I do, so I was able first to find the comet in a photo of the western sky, and then go back outside (from the restaurant where we were eating—the Half Bottle Bistro) and aim the phone’s 5x lens at it. That’s what got me the shot above. Step through this album of shots to see how I got the phone to zero in on the comet while my eyes could hardly see it.

I found this guide from the Stamford Advocate very helpful for finding it as well.

Thursday, 17. October 2024

Mike Jones: self-issued

OAuth 2.0 Protected Resource Metadata Specification in RFC Editor Queue

I’m pleased to report that the “OAuth 2.0 Protected Resource Metadata” specification has been approved by the IESG and is now in the RFC Editor queue. The version approved by the IESG and sent to the RFC Editor is: https://www.ietf.org/archive/id/draft-ietf-oauth-resource-metadata-13.html It joins OAuth 2.0 Security Best Current Practice and JWT Response for OAuth Token Introspection, […]

I’m pleased to report that the “OAuth 2.0 Protected Resource Metadata” specification has been approved by the IESG and is now in the RFC Editor queue.

The version approved by the IESG and sent to the RFC Editor is:

https://www.ietf.org/archive/id/draft-ietf-oauth-resource-metadata-13.html

It joins OAuth 2.0 Security Best Current Practice and JWT Response for OAuth Token Introspection, which are also both currently there.

Thanks to the IETF directorate reviewers and IESG members for their feedback that resulted in improvements to the specification!


The Pragmatic Engineer

The Pulse #111: Did Automattic commit open source theft?

The maker of WordPress took 2M customers from its biggest rival: has a red line been crossed? Also: OpenAI’s impossible business projections, top AI researchers making more than engineers, and more.

The Pulse is a series covering insights, patterns, and trends within Big Tech and startups. Notice an interesting event or trend? Send me a message.

Today, we cover:

Industry pulse. Sudden layoffs at Meta, Spotify confirms work-from-anywhere, US mandates “click-to-cancel,” a historic SpaceX booster catch – and more.

Did Automattic commit open source theft? The maker of WordPress took a plugin with 2M installs, owned by its biggest competitor (WP Engine), and commandeered it. The goal was clear: hit WP Engine where it hurts the most, throwing decades of open source ethics in the bin. It sets a dangerous precedent for the industry. Read the un-paywalled version of this section here.

OpenAI’s impossible business projections. According to internal documents, OpenAI expects to generate $100B in revenue in 5 years, which is 25x more than it currently makes. This would mean OpenAI brings in more money than NVIDIA or Tesla!

Top AI research scientists earn more than engineers. Few companies can compete with the compensation which AI startups pay ML engineers. Still, research scientists at these startups can make roughly double of the highest-paid ML engineers. It’s an interesting new dynamic.

1. Industry pulse Sudden layoffs at Meta

Meta is executing layoffs in the US inside WhatsApp, Instagram, and Reality Labs groups, as reported by The Verge. A prolific person let go in this round is security engineer Jane Manchun Wong, who became known in tech circles by uncovering new features shipped in Big Tech mobile apps hidden behind feature flags.

Elsewhere at Meta, there were layoffs for unusual reasons a week or two ago, when some people were let go from the social media giant’s LA site for misusing $25 meal credits intended for use when working late in the office. Instead, the vouchers were used to have meals delivered to homes, and spent on things like wine glasses and laundry detergent, as per the Financial Times. The FT reports that dismissals were for repeat incidents.

This kind of firing would probably not happen a few years ago, I suspect. Back when companies like Meta were struggling to hire enough talent, violations like misusing meal vouchers would likely have resulted in a warning, and no more. 

It’s interesting to contrast that Meta had zero mass layoffs for the first 18 years of its existence from 2004 until November 2022. Since then, cuts have been regular but unpredictable in their timing. This story is a reminder just how much Big Tech is changing: companies no longer offer the kind of cozy job security that was taken for granted in the 2010s; even Google is no longer a “rest and vest” type of place.

US mandates “click-to-cancel”

Read more

Wednesday, 16. October 2024

Decentral Gabe

The Greatly Exaggerated Demise of SSI: A Rebuttal to Premature Eulogies

💡 This is written in response to a post titled "Why Verifiable Credentials Aren't Widely Adopted & Why Trinsic Pivoted" A forewarning—there will be no tears here and, please, no hugging. credit, XKCD https://xkcd.com/386/ The post I'm responding to

💡 This is written in response to a post titled "Why Verifiable Credentials Aren't Widely Adopted & Why Trinsic Pivoted"

A forewarning—there will be no tears here and, please, no hugging.

credit, XKCD https://xkcd.com/386/

The post I'm responding to makes a point about people resisting its conclusions because their livelihoods depend on it. But let’s be real—Trinsic depends on this too, just as much as mine does. We’re both here, with skin in the game, and that's a good thing. I continue to work on figuring out how to make self-sovereign identity work at scale because I believe it's worth it. If I was not convinced of its value, I wouldn't be here debating it; I'd have moved on to easier, more profitable paths. That’s the commitment this industry requires—guts, patience, and a long-term perspective.

So let's set the stage clearly. This isn't just about egos or jobs; it's about a vision for the future of digital identity. And while you’ve pivoted, it's worth looking at the broader picture and challenging the notion that SSI is somehow "dead" just because it's hard. We’re not the first to face a daunting path—nor will we be the last.

The Market Isn't the Oracle You Think It Is

Your unwavering faith in the market's infallibility is touching. You proclaim, "The market is the objective reality," as if channeling the spirit of a laissez-faire economist at a startup retreat. But let's examine recent history.

Betamax was superior to VHS, but marketing muscle won the day. The QWERTY keyboard isn't ergonomic, yet it's entrenched due to historical happenstance. The invisible hand often leads us down paths of inefficiency and fragmentation, leaving a trail of superior but abandoned technologies in its wake.

Fragmentation and incompatible standards aren't signs of the market "choosing" the best option; they're about who can shout "MINE!" the loudest while elbowing competitors off the playground. Sometimes, it takes a guiding hand—a nudge from governments or standard bodies—to steer the market toward what's efficient and beneficial for all.

Pivoting Isn't Prophecy you can do better, hon

The leap from "We couldn't make it work" to "It can't work" is a logical chasm too wide to cross. Your pivot from SSI evangelist to skeptic, while understandable, isn't a revelation of universal truth.

This is akin to Friendster declaring social media a dead end, or Blockbuster declaring video streaming a passing fad. The inability to monetize a nascent technology doesn't negate its potential; it merely highlights the gap between visionary ideals and the brutal realities of running a startup. For example, certain decentralized identity projects have successfully implemented verifiable credentials in applications like educational certificates, employment verification, and gaining access to the financial system, demonstrating real-world utility and growing momentum. These efforts show that verifiable credentials are gaining traction in specific, focused use cases.

Let me be unequivocal: I commend you for having the guts to start a company, work hard for years, and continue to try to make it viable. That is nothing short of admirable. Your pivot might make sense for Trinsic, but it's hardly the canary in the coal mine for SSI.

SSI, like many transformative technologies before it, may simply be awaiting its "streaming moment" — that convergence of technological readiness, market understanding, and user demand that catapults it from interesting concept to indispensable tool.

The Power of Collective Standards

Dismissing government involvement as a mere "silver lining" is like calling the sun a "helpful light bulb." Let's not overlook crucial historical precedents.

The internet (ARPANET), GPS, and even time zone standardization were government initiatives. The metric system, USB, and Wi-Fi all benefited from coordinated efforts beyond market forces. These standards didn't become universal because the market willed them so, but because of coordination to prevent fragmentation.

It is important not to ignore this. Preventing the chaos of incompatible systems often requires patience and collaboration—yes, even more patience than the industry has shown to this point.

Scattered Seeds Grow Diverse Forests

Your lament that SSI's "heat map is too scattered" misses the point. This diversity isn't a bug—it's a feature. The applications sprouting in education, finance, healthcare, and agriculture worldwide are the seeds of a diverse forest, not a failed monoculture.

Yes, SSI's initial user experience may be challenging, but that's par for the course with transformative technologies. Remember when 2G networks made downloading a single image feel interminable? Or when dial-up internet's screeching was our daily soundtrack?

Every groundbreaking technology endures growing pains. Users adapt, technologists adapt, and the market adapts with them—especially when the endgame offers control, privacy, and empowerment.

Business Goals Aren't Universal Truths

There's a profound difference between "We couldn't make it work for our business" and "The technology is inherently flawed."

Your focus on immediate market adoption overlooks the long-term value of doing the right thing—the hard thing—the thing that maybe, honestly, is not suited for startups that need to show growth and profit now. Perhaps a successful SSI startup is better suited as a B-Corp.

While market challenges are real, the problems that led us to SSI have only grown more severe. Data breaches are no longer news; they're an expectation. Surveillance Capitalism has entrenched itself deeper into the fabric of our digital lives, commodifying our every interaction. Government-issued digital identities, on the hardware of monopolistic tech giants, laced with privacy-eroding mandates, threatening to erode what little privacy we have left.

SSI isn't just a novel technology; it is our defense against these escalating digital threats.

Focused Implementation

While the post laments the scattered nature of SSI adoption, I see this differently. There is a clear answer to the diversity challenge: focus on specific, high-impact use cases. Our net is intentionally not cast wide.

We don't require credentials to be used across different domains from which they were issued. That's a potential bonus, not a requirement. It does not undermine the original goal of the technology; optionality is a strength. It leaves the door open for future cross-domain adoption without dependence on it for current success.

This focused strategy allows us to deliver value in targeted areas while building the infrastructure for broader adoption. It’s a pragmatic approach that addresses immediate needs while paving the way for SSI’s long-term vision. An iterative approach helps build confidence in the SSI ecosystem, one application at a time.

The success of SSI won't come from boiling the ocean. We’re laying the groundwork for future interoperability while delivering value today.

A Vision Beyond The Fiscal Calendar

While some may chase immediate market opportunities, many of us remain committed to SSI's promise. Transformational change isn't measured in fiscal quarters but in steady progress over years, many years. It demands fortitude, collaboration, and a willingness to look beyond short-term gains.

To the SSI community: Let this be our rallying cry. We're not just building a product; we're creating the foundation for human-centric digital identity. Let's continue to collaborate on improving user experience, pushing interoperability, and engaging governments and standards organizations. Every effort, no matter how small, contributes to a future where secure, privacy-respecting, self-sovereign digital identity is the norm.

The most compelling technological advances come from those who persevere when the path gets steep. The future of digital identity—one that isn't controlled solely by governments or tech monopolies—is counting on us.

credit: Dune

Should you find yourself at the next IIW, or the one after that, perhaps hosting a session titled "I Was Wrong About Being Wrong About SSI," feel free to invite us—we'll be there. We won't cry, clap, or hug you. We'll simply nod, in recognition that the most profound insights often come from pushing through challenges, not pivoting away from them.

Tuesday, 15. October 2024

Doc Searls Weblog

Identity as Root

This is from an email thread on the topic of digital identity, which is the twice-yearly subject* of the Internet Identity Workshop, the most leveraged conference I know. It begins with a distinction that Devon Loffreto (who is in the thread) came up with many moons ago: Self-sovereign identity is who you are, how you […]
Shot with a camcoder on a beach in the Caribbean during a 2001 Linux Journal Geek Cruise.

This is from an email thread on the topic of digital identity, which is the twice-yearly subject* of the Internet Identity Workshop, the most leveraged conference I know. It begins with a distinction that Devon Loffreto (who is in the thread) came up with many moons ago:

Self-sovereign identity is who you are, how you choose to be known, to whom, based on reasons of your own. It’s something you own, just like you own your body. Administrative identity is what organizations call you for their convenience (and secondarily, yours). We may call these labels “an ID,” but they are not who we are.

Here is what I wrote:

Humans have opposable thumbs. This makes them capable of tool use to a degree beyond immeasurable. Perhaps nothing, other than their brains, makes humans supremely capable as a species. It also makes them grabby. Try to teach Marxism to a three year old yelling “It’s mine!”

My mother’s favorite account of me as a small child was how I walked around constantly holding as many small toy trucks in my hands as I possibly could, unwilling to let them go. But this tendency was about control more than possession. I hadn’t yet learned to put my trucks in something I could carry around. I was unwilling to trust that a box or a bag was a working extension of my grabby little self.

I’m still a bit like that. “Your trucks” is what Joyce calls the electronic stuff I carry around. But I’m not alone. We conceive everything in terms that imply or involve forms of control, possession, or both. The English language, among many others, cannot get along without possessive pronouns: my, mine, our, ours, their, theirs, your, yours, hers, his. Even if ownership in the legal sense is not involved, responsibility is. Control is. When you drive a rental car, those are your wheels, your bumpers, your engine. You also think and talk about them with first person possessive pronouns.

Personal agency moves outward from that sense of control and responsibility over what is ours, including our selves.

This is why we need to start understanding personal identity, and how it works in the world, by recognizing that each of us is a self-sovereign human being. We are each, as William Ernest Henley put it in his poem Invictus, the captain of our “unconquerable soul.” Walt Whitman’s Song of Myself was a long-form explication of the same thing. (Which I wrote about way back in ’96, before there were blogs.)

This is not to deny that we are also profoundly social. But it is essential that we start with the personal.

Ownership is an abstract concept. So are rights. But we need both to operate civilization.

What makes ownership righteous is that it minimally abstract. We see this with the possessive pronouns “my” and “mine.” Again, this is rooted in our possessive nature, our opposable thumbs. We need to be self-possessed (and -sovereign), autonomous, and independent—before we are anything else, including our social selves.

In technical terms, it’s root.

So there ya go.

*But not the only subject. I’d say about half of the topics that come up at IIW are for topics other than identity.


Riley Hughes

Why Verifiable Credentials Aren’t Widely Adopted & Why Trinsic Pivoted

A presentation I gave at IIW38, in April 2024. At the Internet Identity Workshop 38, I announced my talk with a description something like this: Five years ago at this conference, Trinsic’s first product (our self-sovereign identity wallet app) was launched. Today, I’m hosting a session called “SSI didn’t work. Trinsic is pivoting.” I knew it was a provocative title — and I fully consid
A presentation I gave at IIW38, in April 2024.

At the Internet Identity Workshop 38, I announced my talk with a description something like this:

Five years ago at this conference, Trinsic’s first product (our self-sovereign identity wallet app) was launched. Today, I’m hosting a session called “SSI didn’t work. Trinsic is pivoting.”

I knew it was a provocative title — and I fully considered that apostatizing this publicly could result in a mob of opposition. But I was not prepared for what actually happened.

Right after my announcement (and brief, surprised gasp from the crowd of ~350 people) one IIW veteran popped off his chair and clapped his way across the room to embrace me. Right before my session, the CEOs of two separate identity companies asked to broadcast the talk online (one via livestream, another via live Tweet). Right after my session during the closing circle (where the original crowd of ~350 reconvened), the cofounder of the conference wept as she thanked me. Another longtime identity veteran stood immediately after and claimed it was the most important session held at the event in the last 5 years. The next morning, a startup CEO called for memorializing a “Riley Session” as a new staple of the conference twice each year. More than a dozen people asked me to reprise my session.

I say this all descriptively, not boastfully. Far from feeling proud, I actually felt humbled to have had the opportunity to contribute something evidently valuable to a community which has given me so much. (It was at an IIW in 2018 that I realized I wanted to build a career in this industry.) That is the context for this post — to share apparently-useful insights that hopefully will result in getting the world closer to using better digital ID technologies.

The spoiler for this article: I’m no longer betting on verifiable credentials taking off. I mean this literally — 99% of my net worth is Trinsic stock, so the strategy I bet my company on is an objective revelation of my strongest convictions. While once central to Trinsic’s existence, our latest pivot demonstrates that I believe interoperable digital credentials won’t succeed in market anytime soon.

Specifically, my prognostication is that digital identities will continue to be deployed and adopted, but that fragmentation, incompatibility, and bespoke implementations will continue to dominate the market.

Now, I’ve been working on this problem for 7 years. Explaining how I landed on this conclusion requires expounding on years of the business equivalent of trench warfare. This post is a deep dive, not a summary. If you want a summary, I suggest dropping this post into your LLM of choice (I won’t be offended). I’ll do my best to be concise, but I also intend on being thorough. The structure of this post is as follows:

Assumptions

Brief History

Impossible Problems

Bull Case

End

Assumptions

Assumption #0: I’ve drunk the Kool-aid

As you read this post, your walls might go up. Especially if your salary depends on me being wrong. Just be aware of your biases and know that I am still Kool-aid drunk on the philosophy of self-sovereign identity. I want that future as much as anybody. But in my sincere pursuit to discover reality, I’ve learned some things that I hope others can benefit from.

Assumption #1: Terminology

I will use “verifiable credential” or “VC” in this post quite a bit. I’ll also use “self-sovereign identity” or “SSI” or “IDtech”. “Decentralized identity” or “DI” could be subbed in too for all I care. Fundamentally, I’m trying to communicate: A standard way to exchange trustworthy data in a user-centric/user-controlled way. But I’m intentionally avoiding being too pedantic, so don’t get overly caught up in semantics.

Assumption #2: The world is huge

The world is inconceivably big.

Consulting in SSI is a good way to make a living. But hand-to-hand combat is insufficient to get VCs adopted. The only way to scale something on the magnitude needed for SSI to work is a product with product/market fit. That, or regulatory obligation.

Imagine if companies needed Accenture to adopt artificial intelligence. Now consider how that differs from a world where any developer can pick up an API key from OpenAI and implement GPT4 in an afternoon. LLMs have product/market fit, SSI does not.

Assumption #3: The market is reality

Here’s how I think about startups & new product development. The market is unknowable in its entirety. If it was known how to build a successful product (in our case, a widely-adopted SSI wallet or an adoptable VC format), it would have been done by now. To make progress at the cutting edge, you must do something desirable and different. Andy Radcliff calls this being “non-consensus and right”.

Chamath describes a startup as an exercise in mentally modeling the “market” and building a product to fit that model. If your product is rejected, your model of the world is wrong and you need to iterate its form. If your product is adopted, your model of the world is right.

All this is to say, the market is the objective reality. It decides winners and losers. As I’ve sought to get product/market fit in this space, I seek to discover reality as impartially as I can. When I see evidence of something working, I pay attention; when I see evidence that something isn’t working, I try to absorb those data points too.

Assumption #4: The best technology gets adopted

The worth of software is its impact on humans. The most elegant, idealistic, best software is therefore worthless unless it’s adopted. Actually, if it’s not adopted, it’s not elegant, idealistic, or best — no consolation prizes. The market is the judge and if your product isn’t making an impact on people’s lives, it’s no better than a refrigerator on the moon.

Adoption isn’t just the most important thing, it’s the only thing. Thus, this is actually a tautology: by definition, what gets adopted is best.

Assumption #5: Don’t waste your time

I’ve thought a lot about the optimal point between discouraging and cautioning. Yes, I’ve tried a lot of things but it’s important to state that markets change. Maybe something is different now than when I tried it. I surely can’t turn over every stone. Running new or differentiated experiments in this market is probably fruitful–I don’t want to discourage innovation. But I do want it to shine a light on the graveyard that exists in this space and hopefully prevent people from doing things that are doomed to fail.

Brief history

I started my journey in identity in 2017 when I interviewed for a job at Sovrin Foundation. I started Trinsic with two cofounders in 2019 because of the relatively low adoption of Sovrin’s network. I thought if we made the technology dramatically easier to use, more companies would adopt SSI.

Me plus Tomislav & Michael, my cofounders, launching our product at IIW in 2019.

We built something good. Hundreds of developers were signing up every month and issuing verifiable credentials. One of those developers was Wayne Chang who, before cofounding Spruce, gave me the feedback that Trinsic was the best execution of any SSI product in the market. He asked something like:

“With a product like this (where a developer can get value for free) you are probably either using this as lead gen for $250k consulting deals, or you’re just really betting on verifiable credentials blowing up.”

I told him we were betting on VCs blowing up.

Our aim was to build a product that would accelerate VC adoption. In our view, if a customer needed consulting to succeed, it was a signal the product wasn’t good enough. (We did consult customers on how to succeed — but we treated this time as product discovery time, and incorporated the results immediately into the product roadmap.)

What happened is we acquired dozens of paying customers, but not enough were successful enough. We had roughly 1 breakaway customer story per 500 companies that signed up. The rest were either startups that failed to get traction or large enterprise R&D departments that never went to production.

In the meantime, the market shaped out in an undeniable way. Dozens of companies obtained product/market fit in the reusable identity space, none of which use the VC standard. Either none of the successful companies chose to use VCs, or none of companies that chose VCs succeeded (two statements of identical outcome but different presumed causation).

A slide I made in 5 mins immediately before my IIW38 presentation to make this point!

In fact one such company, which raised over $75m from top-tier venture capitalists to fuel its meteoric growth, said to me (paraphrasing):

“The SSI people had a good idea. So we’re applying that idea to get customers while they’re wasting their time in working groups. You can think of us as proprietary SSI.”

It’s been a few years since I was told that. I recently checked back in on them. They’re used by tens of millions of users.

Another reusable ID company used by several million end-users, which has raised 10s of millions, told me (paraphrasing):

“We are building a monopoly. Our goal is to own the identity layer of the internet. It might not work, but if it does, we’re going to make a lot of money.”

Both companies were started after 2019 (the year the VC data model became a W3C recommendation) and chose not to use standards.

This dynamic vexed me for months. Why was the market rewarding companies with inferior identity solutions?

It took me too long to conclude that SSI is the inferior identity solution. The market determines winners and losers, better and worse — and there are important reasons the market is rejecting verifiable credentials in favor of proprietary solutions.

After 4+ years of hypothesizing, experimenting, evaluating, and iterating, I found what I consider to be intractable problems that block adoption of verifiable credentials. I do not know the answers to these problems. If you’re building in the IDtech space, you should take these problems seriously because, in my view, these are the billion-dollar questions you’ll need to crack to have any meaningful success.

Impossible problems

Impossible problem #1: Interoperability is the premise, but doesn’t exist

Standards exist to enable compatibility. If interoperability isn’t important, simple alternatives to verifiable credentials exist (such as signing — or not signing — JSON). The ability to share data with an arbitrary 3rd-party in a standardized way (without bespoke integration or direct connection to issuing source) is the core purpose of VCs. But that doesn’t actually exist right now.

In other words, the reason for the existence of VCs is not present.

This is because interoperability at the application level requires technical interoperability of keys/signatures, public key resolution, exchange protocols, request/response formats, data models, semantics, and a host of underlying elements. (Not to mention nontechnical interoperability elements, like business models and trust frameworks.) Because a myriad of VC variants, exchange protocols, signature suites, etc. have emerged over the years, which must all line up to effectively interoperate, the only practical way to achieve interoperability with others is to use the same software.

Theoretically, if everyone in the industry agreed upon a single standard to use, interoperability would be present, and there would begin to be value to adopting VCs today — but there’s a problem: which standard should be adopted? and what if it’s the wrong one?

Prematurely standardizing is a risky endeavor. But in order to find out what kind of digital credential will have real product/market fit — and thus be adopted — the world needs to experiment, iterate, innovate, over trade-offs to discover what the market will adopt.

Do you see the problem? VCs need innovation, iteration, and experimentation to succeed. But they also need standardization, agreement, buy-in, and stagnation to have value. These two forces are directly opposed. I wrote about this dynamic more in a post several years ago.

An image I made in 2022 for a post I wrote about the tension between interop & innovation.

Standardization is essential for adoption. But adoption is essential for knowing what to standardize (there’s nothing worse than standardizing the wrong thing)! Prematurely standardizing before adoption is a classic “cart before the horse” scenario.

To be clear, none of this means that there is no reason to adopt VCs now — it means every reason to adopt VCs today is in anticipation of future value. Because today, verifiable credentials offer zero marginal value relative to simpler alternatives. To rationally adopt VCs, one must make a bet on the future. Some bold companies will be willing to take that very costly journey (like we did), but most will not.

Impossible problem #2: Initial UX is worse than baseline

Verifiable credentials are always a better UX than existing identity solutions at scale, but always a worse experience initially.

I had a mask-laden conversation with the late Vittorio Bertocci at the Internet Identity Workshop in 2021 where I was working to convince him of the merits of VCs as he advocated for federated approaches.

His point: VCs aren’t necessary to share data outside its original domain. Take, for example, Airbnb wanting to verify a digital driver’s license. Instead of using a VC, DMVs could become an IdP, and Airbnb could add a “verify with X driver’s license button” into their app. The tech exists today.

My point: Yes, but nobody does that today because it doesn’t work in person, doesn’t work offline, doesn’t work when presenting multiple credentials, doesn’t work when privacy needs to be preserved, etc. But most importantly, in this example Airbnb would need to add a “verify with X” button for all 50 states, which is a very bad UX — VCs would enable one universal verification button!

I’ve thought a lot about this interaction. While I like to think I helped nudge Vittorio a bit on the value VCs could provide, you’ll notice my points are basically only relevant at scale.

For a concrete example, take AI agents who need identity wallet capabilities. We ran a whole campaign about this at Trinsic and spoke to dozens of AI agent developers (see this landing page, watch the demo video!). It’s obvious the problem exists in the long-term. As a user, it’s a pain to provision an agent access to all my accounts, info, etc. needed for it to accomplish real tasks for me. But in the near-term, 100% of the AI developers we talked to just chose to use OAuth to connect to sources where users’ data exists already (e.g. Google).

If a user already has a wallet with a bunch of useful credentials in it, obviously sharing those is a better UX than redoing the onboarding process that was required to obtain those credentials. But acquiring the wallet with credentials in the first place will always cause extra friction upfront, which businesses and users are very reluctant toward.

This problem is compounded because businesses don’t just adopt things that are a little bit better than alternatives. In order to switch behaviors or risk their careers on unproven technologies, new solutions need to be better enough to justify that extra investment. And unfortunately for VCs, for every initial use case I’m aware of, the experience is objectively worse.

I shared more insights from Vittorio here.

Impossible problem #3: Heat map is too scattered

When I was raising money for Trinsic as a general-purpose verifiable credential platform, investors would often ask what the most prominent use case for verifiable credentials was — and more specifically, what the most common use case for Trinsic’s customers was. My answer sucked:

Our customers are building use cases in education, employment, financial services, healthcare, insurance, industrials, government, and agriculture. And more!

Once, in pursuit of a better answer to this question, we plotted the industries, geographies, and use cases of 100+ paying customers. The scatter plot was all over the place. Nothing stuck out prominently. It didn’t make sense to double down on a specific industry or use case because none dominated our demand. We didn’t even have geographic consistency — adoption existed from Myanmar to Peru to USA to Romania.

My cofounder Michael continuously sounded the alarm in my ear: in order for VCs to take off, a network effect needs to take hold. No network effect will exist if all adoption is completely scattered.

This problem doesn’t just affect Trinsic, it affects the entire VC space. Until pockets of adoption breed more adoption in a virtuous cycle, the rate of adoption will be capped to a linear value — and adoption to 1+ billion people requires exponential rates.

To drive this point home, I’ll cite an example from a packed room at a previous Internet Identity Workshop, circa 2020. In a room full of decentralized ID vendors, I asked for a raise of hands: “Who has ever heard of a verifiable credential being used for a purpose other than that for which it was issued?” (In other words, who has ever heard of a verifiable credential being used for the reason the VC standard was created.)

No hand went up.

I asked again in 2024. Still crickets.

Verifiable credentials won’t take off until this sort of thing is commonplace. This cross-context data sharing simply won’t happen until both sides of the exchange are interested in similar credentials, or are from a similar industry or geography. In other words, this won’t happen until the heat map of potential use cases lights up in a particular spot. But the current nature of the market is very scattered.

Bull case

This post outlines why I’m not betting on verifiable credentials taking off anytime soon. But if I had to articulate the bull case, there are two arguments here: government mandates and timing/direction of causality.

Governments

Governments represent the silver lining for verifiable credentials. While, in my view, the private sector lacks a viable adoption path for the reasons outlined above, governments have different incentives. Governments build utilities, and there is a good argument that interoperable digital ID is a government utility. Many governments in the West are investing in VCs, most notably eIDAS 2.0. Although even that is far from settled.

Government involvement is not limited to government-issued credentials. Virtually every private sector company in the verifiable credential world (except Trinsic, for what it’s worth) has been subsidized by governments funding their preferred VC standards (DHS funding different standards than the EU, for example). Various technologies through the years have succeeded due to government mandates or subsidies–nothing wrong with it inherently–but it is a telling aspect.

Governments compelling use of interoperable digital credentials represents the most likely success case for VCs I can think of. But that scenario exists in the medium-term, not the short-term.

Timing

The other argument says that verifiable credentials are simply too early. Many of the leading reusable ID ecosystems like IDme, Yoti, CLEAR, etc. were started and initially grew before the VC standard was finalized/stabilized. So one could argue these companies, if started today, could have had the same or greater success using verifiable credentials rather than a proprietary solution.

Unfortunately, recently companies like Incode, Footprint, Plaid, and many others launched “reusable ID” products and chose not to use VCs after extensive research. And have succeeded far beyond most SSI products. Which presents a difficult rebuttal to this argument.

One cannot rule out the possibility that companies can succeed betting on the VC standard. But it’s difficult to point to examples of VCs helping a business solve problems for customers (and thus make money) — and it’s trivial to point to dozens of examples of companies sinking millions of dollars into verifiable credential infrastructure (money which could have been spent acquiring customers). One hypothesis is that tinkerers are attracted to VCs while aggressive operators are less likely to experiment on new technologies or make speculative bets on the future.

End

I referenced Vittorio earlier, and I’ll cite him one last time here. In a blog post about his involvement in the CardSpace project at Microsoft, which is a spiritual ancestor of modern SSI, he said:

“When the user centric identity effort substantially failed to gain traction in actual products, with the identity industry incorporating some important innovations (hello, claims) but generally rejecting many of the key tenets I held so dear, something broke inside me. I became disillusioned with pure principled views, and moved toward a stricter Job to be done, user cases driven stance.”

I’ve not changed my convictions on the future I want to live in. Nor has my resolve wavered in my desire to build that future. But the market has spoken. Prematurely standardizing a pre-product/market fit technology won’t work — no matter how badly zealous advocates of SSI want it to. That is why Trinsic is no longer a general-purpose verifiable credential infrastructure company.

While I don’t expect a universal standard for digital IDs will happen anytime soon, new digital IDs continue to roll out every day. These digital IDs are sometimes built with VCs, other times with mDLs, but usually in proprietary ways. Some are bound with biometrics, others with passkeys. Some are focused on solving a problem in a specific industry, others in a specific geography.

Today, the Trinsic Identity Acceptance Network represents the future of Trinsic. Instead of helping companies create digital IDs (issue verifiable credentials or create wallets for users), we help businesses accept existing digital IDs. Our strategy is a bet on continued fragmentation of the ID landscape — a bet on no single standard winning out.

In fact, I’ve become convinced that Trinsic’s current strategy to help businesses accept digital credentials is actually a necessary stepping stone to a future world powered by interoperable digital credentials. We’re still ushering the world in that direction, just from a different angle.

Of course, I could be wrong, and I hope I am. My ears and eyes are wide open for new information. My only objective is to see reality as clearly as I can. If you have information you believe I’m missing, I’d love to hear from you and have a conversation. Sincerely! You can DM me on LinkedIn or Twitter, or shoot me an email riley@trinsic.id.

Thanks to Kim Hamilton Duffy, Gerald Glickman, Timothy Ruff, Eve Maler, James Monaghan, Dave Grantham, and Zachary Jones for giving feedback on a draft of this post.


The Pragmatic Engineer

Why techies leave Big Tech

A job in Big Tech is a career goal for many software engineers and engineering managers. So what leads people to quit, after working so hard to land these roles?

Hi – this is Gergely with the monthly, free issue of the Pragmatic Engineer Newsletter. In every issue, I cover challenges at Big Tech and startups through the lens of engineering managers and senior engineers. If you’ve been forwarded this email, you can subscribe here.

In case you missed it: the first two The Pragmatic Engineer Podcast episodes are out: Efficient scaleups in 2024 vs 2021 and AI tools for software engineers, but without the hype. Each episode covers approaches you can use to build stuff – whether you are a software engineer, or a manager of engineers. If you enjoy podcasts, feel free to add it to your favorite player.

Ask a hundred software engineers what their dream company is and a good chunk are likely to mention Google, Apple, Meta, Microsoft, and other global tech businesses. This is no surprise:

Brand value – few businesses in any sector are better-known than Big Tech

Compensation – pay is usually at the very top of the market. Ranges can get especially high in staff-and-above positions. We cover more on pay in The trimodal nature of tech compensation and in Senior-and-above compensation in tech

Scale – infrastructure used by hundreds of millions or billions of users, data storage measured in petabytes, and services which handle requests at the thousands per seconds, or above

With upsides like these and others, why walk out? To learn more, I asked several software engineers and engineering leaders who did precisely this. Personal experiences vary, but I wondered if there are any common threads in why people quit prestigious workplaces. Thanks to everyone who contributed.

In today’s deep dive, we cover:

Big Tech less stable than it was. Big Tech had few to no mass layoffs for years – but that’s all changed.

Professional growth in a startup environment. It’s hard to keep learning in some Big Tech environments, compared to at startups.

Closed career paths at Big Tech. It’s often more achievable to get to executive-level engineering positions at scaleups, than at global companies.

Forced out. The bigger the company, the more there’s politics and scope for workers to be victims of competing interests, personalities, and rivalries.

Scaleups get “too Big Tech.” Scaleups are nimble and move fast with few processes. Successful scaleups slow down and add more process.

Steep compensation drops. A falling stock price can make people consider leaving if it leads their compensation to also reduce. Also, when an initial equity grant vests out after 4 years.

Raw feedback. A former engineering leader at Snowflake shares their take on how people leave workplaces – or struggle to; golden handcuffs, a Big Tech hiring slowdown, a bifurcated market, and more.

Some reasons techies leave large and prestigious tech companies 1. Big Tech less stable than it was

Luiz Santana was a tech lead manager at Google in Germany, before leaving to cofound the health tech startup, Digitale Patientenhilfe. Before Google, he was a senior engineering manager at fintech N26, and head of engineering at ride-sharing app, FREE NOW. Luiz shares why he decided to say farewell to what looks like a techie’s dream job:

‘Some things helped me make the decision to leave Google:

The startup I got an offer from had raised healthy funding, meaning a good amount of runway

I managed to negotiate to join as a cofounder and CTO, which meant a healthy equity package.

The other two cofounders have a great track record with a previous startup. This gave me confidence.

‘Google changed a lot, which also made the decision easier:

Google had layoff tension at the time. In mid-2023, there were lots of small, unpredictable layoffs, which made Google feel less of a “secure” place to work.

The culture also changed visibly. There was cost cutting, ranging from small things like reduced snack selection, to some teams having trouble securing servers they needed for organic growth.

I realized I am no fan of promotion-driven culture, which I saw to result in outcomes I didn’t like.

‘Google makes it hard to leave. Some big factors held me back from quitting:

Compensation. The biggest challenge, by far! Google pays very well, and few if any companies can match the total package. In my case, I received my stock grant at half the stock price to what it was vesting at. This meant that my equity compensation was already worth double!

Brand. I have always been a big fan of Google products, and being associated with Google is a big positive in so many ways.

Risk. Staying at Google is lower risk – even with layoffs – than joining a startup is.

‘Personal circumstances made the decision to quit easier.

I had “layoff fatigue” keeping track of all the small layoffs in various teams.

In Germany, I was at higher risk of being laid off because I was not married at the time and do not have kids. There’s a “social criteria” for redundancies in Germany, and I was not in a protected bracket; if a layoff happened, I was a prime candidate.

I trusted the founders, and appreciated that they spent a lot of time with me, sharing their vision.

‘After a lot of back-and-forth, I finally pulled the trigger to join the startup. I’ve not looked back since!’

Luiz’s hunch about diminished job security echoes the reality. Since mid-2022, Big Tech has shattered its image for job security:

Meta let go ~25% of staff in 6 months in two separate layoffs. Before November 2022, the company had never done mass redundancies; then it did two.

Google never did repeat annual layoffs in its history until in 2024, following shock layoffs in 2023

Amazon made deep cuts in 2023. The company is also known for less job security due to using PIPs more than most other Big Tech companies. PIPs are used as part of meeting unregretted-attrition targets

Microsoft did large cuts in 2023 and small layouts since

Tesla did widespread layoffs in secret in 2022, hitting long-time employees with a 1-week severance package.

Apple and NVIDIA are the only two Big Tech companies not to do mass layoffs in the last two decades. 

2. Professional growth in a startup environment

Benedict Hsieh is a software engineer based in New York City, who spent five years at Google, before quitting for a startup in 2015. Ben describes his journey:

‘I didn’t want to become a ‘lifer’ at Google. This was the specific reason I left Google: I felt I was headed in a direction of being stuck there for life. I was only learning Google-specific tech, and the position was not very demanding. I felt like I should be working harder and learning to create value on my own, instead of only functioning as a cog in the machine.

‘I’d stopped “exploring” and was mostly “exploiting.” There is a mental model I like called the explore-exploit tradeoff. Exploitation means you choose the best option based on your current knowledge. Exploration means you try out new technologies and approaches. Reflecting on my day-to-day work, it felt that almost all of it was “exploiting,” and I was doing very little “exploring.” It was too early in my career (and life) to stop exploring!

‘I think my mentality of worrying about not doing enough “exploring” is rare. Almost all my former peers are still at Google because the total compensation is really, really hard to beat!

‘Looking back, I was overconfident about how quickly I would grow in startup-land – both professionally and in the financial sense. I was willing to take the hit on significantly decreasing my total compensation, and getting a larger chunk of startup equity. I was impatient about hitting my “retirement number” by joining a fast-growing startup with much higher upside.

‘Also, to be frank, I figured that I could go back to working at Big Tech anytime I wanted: because I spent enough years there, and had a pretty good existing network.’

Ben joined a startup as a cofounder. The experience was not what he expected, as he wrote about:

‘I was miserable. We were working out of [my cofounder’s] unfinished apartment which was freezing cold in the middle of the winter and a constant reminder of all the things that weren't going well. I'm a low-conflict person who needs hours to calm down after an argument, where she preferred communicating via loud debate.

‘I was trying to learn all kinds of things that we needed for our business – how to work with clients, keep our servers up at all hours by myself, debug statistical anomalies in our data, or send out cold emails to find new business. I was the only one who could do these things, so I got them done. I woke up early in the morning and had trouble sleeping at night. Once I worked past midnight to compile a report for a client who'd requested a last-minute meeting in the morning, only for them to no-show, followed by an email two days later asking me why I hadn't found another way to send them their data. If I had asked my body what it wanted in that moment, it surely would have responded with incoherent screaming. It basically did that without being asked.

‘Our company folded in less than a year. 

‘But in eight stressful and mostly unpleasant months I accomplished more than I had in the eight years before that. We made some money for our clients, and a minimal but nonzero amount for ourselves, and I was able to parlay the experience into an early position at a much more successful startup. More importantly, I learned how to just get things done when they need to be done, instead of feeling like a helpless bystander watching a car crash.’

Ben reports that the new startup he is working at is doing a lot better, and reckons he needed a “startup shock” to develop his professional skills beyond the (comparatively) neat and tidy confines of Google.

3. Closed career paths at Big Tech

A product manager based in Seattle worked in Big Tech for 14 years: 3 at Amazon, and 11 at Google, where they went from a product manager on a single product, to senior product manager, group product manager, and product lead for a portfolio of products. Despite promotions into influential positions, they quit the search giant for a fintech startup, as VP of Product. They asked to remain anonymous, and share:

‘I'd already decided to quit Google without a new gig lined up. This was because I couldn't find a new role that was a combination of interesting challenge, interesting people, and/or one that fulfilled my career goals. I had over 50 conversations inside Google for ~9 months. 

‘I talked to many ex-Googlers and ex-Amazonians during interviews. I'd never heard of my current company prior to joining, but most people I met during the interview were ex-Googlers/Amazonians. They were tackling the worthy, difficult problem of building a truly modern fraud monitoring and management platform.

‘This company isn't a remuneration leader by any means. "Closing" a candidate – them accepting an offer – is a combination of:

A strong “sell” during interviews

Showcase the concentration of world-class talent at the company

Highlight that the team ships very fast – much faster than Big Tech!

Articulate interesting technical and product challenges the team overcomes

‘Despite not knowing about them, it turns out this business has a strong brand in the banking software sector. They have established business moats, and the more I learned, the more impressed I was.

‘The company is in the middle of an organizational turnaround that I get to be an active part of, as a VP. This challenge appeals to me because I get to work with a really motivated set of people who are focused on making a big difference within the company, but also across the financial industry.’

This journey from Big Tech middle-management into leadership at a scaleup, makes a lot of sense. Making the jump from engineering manager or product lead, to an executive position, is close to impossible at Big Tech because the change of scale is vast. An engineering lead might have 10-50 reports, but a VP or C-level will oftentimes have 10x more. There are exceptions, of course, like Satya Nadella, who rose through the ranks at Microsoft, from software engineer, through vice president, to CEO. But in general at large companies, getting promoted to the executive level is formidably difficult. Scaleups offer a more achievable path to C-level.

At the same time, tech professionals with managerial experience in Big Tech are often perfect fits for senior positions at scaleups. Recruitment like this can be a true win-win! A new executive gets to learn a lot by getting hands-on with strategy, attending behind-the-scenes meetings, liasing with the board and investors, and many other experiences that are simply off limits at Big Tech. 

In exchange, the scaleup gets a seasoned professional who doesn’t panic when facing decisions potentially involving tens of millions of dollars, and who can make correct, well-informed decisions – which is what Big Tech managers do, usually.

4. Forced Out

Working at Big Tech is far from perfect; the larger the company, the more organizational politics there is, some of it bad.

Justin Garrison, former senior developer advocate at AWS, felt this after he posted an article that criticized the company, entitled Amazon’s silent slacking. In it, he wondered if Amazon’s sluggish stock price was the reason for its strict return to office (RTO) push, and whether it was a way to quietly reduce headcount via resignations. Justin shared other observations in the article:

“Many of the service teams have lost a lot of institutional knowledge as part of RTO. Teams were lean before 2023, now they’re emaciated.

Teams can’t keep innovating when they’re just trying to keep the lights on. They can’t maintain on-call schedules without the ability to take vacation or sick days.

The next logical step to reduce costs is to centralize expertise. It’s the reason many large companies have database administration, network engineering, or platform teams.

They’ll have to give up on autonomy to reduce duplication. Amazon has never had a platform engineering team or site reliability engineers (SRE). I suspect in 2024 they’ll start to reorg into a more centralized friendly org chart.”

Justin’s team was also hit by layoffs: his team was eliminated, but not his role. He was left in a limbo state of needing to find another role within the company, and was not offered severance. Justin suspected Amazon was aiming to avoid paying severance packages, and incentivised managers to put engineers on a performance improvement plan (PIP) and let them go without severance.

In the end, Justin didn’t want to go through what he predicted would be a demotivating, unfair process that would end in him being fired. So, he quit. 

Afterward, he joined infrastructure startup Sidero Labs as head of product, building what they aim to make the best on-premises Kubernetes experience. 

Ways out of Big Tech manager conflicts

There’s a saying about quitting that “people don’t leave bad companies, they leave bad managers.” It contains a kernel of truth: a bad manager is often reason enough to leave because it’s the most significant workplace relationship for most people.

At large companies, there is an alternative: internal transfers. As an engineer, if you feel held back by your manager or team, you can attempt to move. Internal transfers are usually a lot less risky– as someone changing jobs – than interviewing externally. With an internal transfer, you get to keep your compensation and network inside the company; in fact, you grow it. Also, your knowledge of internal systems and products is valuable. 

There are usually a few requirements for an internal transfer to happen:

Minimum tenure: internal transfers are open to those at the company or in their current team for a year or more.

Good standing: performance reviews which meet expectations are needed to get to move, usually. This is to avoid low performers escaping to switching teams. Being on a performance improvement plan (PIP) is a blocker to moving at most companies.

Other teams’ headcounts: internal transfers can only happen when teams have the budget for your level. Internal transfers are a way to hire more efficiently.

Pass an interview: at many companies, internal transfers go through an internal interview. This is usually a lot more lightweight than external ones. The process usually depends on the manager. It might be a simple chat and review of your existing work, or be more competitive if there are other candidates. For example, at Microsoft/Skype, when I changed teams as a developer, my new manager had internal candidates do a software architecture interview.

Get approval from the existing team. At some places, this can be a thing! An existing manager can slow down a transfer, or even sometimes veto it. However, in practice, if an engineer and manager have a poor relationship but the engineer has decent standing, then the manager doesn’t have much reason to block their departure. Of course, a manager may be able to make the situation challenging enough that seeking opportunities externally seems like the better option.

5. Scaleups get “too Big Tech”

An engineering leader spent four years at Snowflake after joining in 2019, right before its IPO. They’ve asked to remain anonymous, and share why it was time to depart the data platform:

‘Snowflake became “too Big Tech” for my liking. When I joined, there was a lot of uncertainty within the company and teams moved quickly. We had to make rapid changes, and four years later, things looked different:

Stable teams

Mature and well-documented processes

Lots of internal committees

Ever-growing amount of documents

Endless program management work before starting anything meaningful

Lots of politics! Cliques formed and there was “empire building” in upper management.

‘I have to admit, none of this is for me; I’m more of a “move fast and build things” person. At the same time, I acknowledge that many people felt very comfortable with these changes, and thrive in them!

‘The reality is that the company became successful, quickly. I enjoyed being part of the ride and helping create this success, but the change in culture made it feel less close to me than the “old” culture.

“Working at a scaleup that became “Big Tech” made it so much easier to leave! I’m certain that having Snowflake on my resume gave me a huge head start on someone equivalent from a medium or lower tier company. If I didn’t have Snowflake on my resume, recruiters would have skipped over me, and hiring VPs would be extremely skeptical. 

‘So while there have been lots of changes in culture thanks to the standout success of Snowflake, it gave a lot of career options to me and everyone who helped build Snowflake into what it is today.’

6. Steep compensation drops

Big Tech compensation packages usually have three components:

Base salary: the fixed sum in a paycheck

Cash bonus: awarded at the end of the year at some companies. Netflix is among the companies which do not award bonuses

Equity: awarded as an initial grant that vests over 4 years, usually. Most Big Tech companies offer equity refreshers

The more senior a position, the more of the compensation is in equity. Tech salary information site Levels.fyi maps how Microsoft’s positions offer considerably more equity, and how principal-and-above engineers usually make more in equity per year than in salary:

Microsoft’s typical US compensation packages for software engineers. Source: Levels.fyi Rising stock prices make it hard to hire away from public companies

Equity is converted from a dollar amount to the number of stocks on issue date. This means that if the stock value increases later, so does the grant value. If the stock goes down, so does the grant value, and total compensation with it.

This connection is why it’s close to impossible for a company to tempt NVIDIA employees to leave the chip maker, if they joined in the past four years and are still vesting out their initial grants: NVIDIA stock is worth 10x today than 4 years ago. So, let’s take an engineer who joined in October 2020 with a compensation package of $250K per year:

$150K base salary 

$400K in equity (vesting $100K/year on the issue date)

Four years later, this engineer’s 2024 total compensation is around $1.15M, thanks to stock appreciation:

$150K base salary

$1M in equity vested in 2024 (thanks to that $100K/year grant being worth 10x, $1M/year!)

Falling stock price: big incentive to leave

Stock prices don’t only go up, they also go down; and when they do the equity value of comp packages drops significantly. We previously covered how low stock prices lead more people to leave listed tech companies in May 2022. From The Pulse:

Some tech stocks have had a terrible past 12 months. Here are some of the tech companies which have seen their stock prices decrease the most since a year ago:

Stitch Fix: -79% 📉 

Redfin: -71% 📉 

Pinterest: -65% 📉

Toast: -64% 📉

Robinhood: -61% 📉

Zoom: -61% 📉

Roku: -60% 📉

Opendoor: -56% 📉

Docusign: -48% 📉

In comparison, some Big Tech have done well:

Nvidia: +107% 📈

Tesla: +63% 📈

Apple: +47% 📈

Google: +37% 📈

Microsoft: +34% 📈

Oracle: +20% 📈

Let’s take a senior software engineer who offered a $350K/year package in March 2021. Let’s assume they got this compensation package at all of the above companies, and that the package consisted of:

$200K cash compensation (e.g. $170K base salary, $30K bonus target) 

$150K/year stock compensation ($600K in stock, vesting over 4 years). 

Here’s what their compensation would look like, assuming no cash compensation changes:

Stark difference in compensation outcomes due to stock performance

Back when these stock drops happened, my suggestion was this:

“If you’re an engineering manager at a company where the stock has dropped significantly: buckle up for a bumpy ride. Unless your company can deploy significant retention grants, you will likely see record attrition in the coming months. Make cases for these retainers, but know that companies have financial constraints: and this is especially the case if the stock underperforms for a longer period of time.

If you’re looking for a new position: at places that issue equity, you’ll need to take a bet on the trajectory of the company. Consider companies where you believe in the company, their products, and how those products will grow over the next several years.”

Over time, Big Tech stock has done much better than many recently IPO’d tech scaleups. The biggest stock drop happened at Meta, at the end of 2022. In just 6 months, the company’s stock price dropped from $330 to $88 – a 70% drop! Everyone who joined before 2022 saw their stock grants lose 50-70% of value on paper. Recovery was uncertain:

Meta’s stock price 2019-2022. A drop from mid-2022 hit equity packages

That year was probably one of the best times ever to hire away from Meta, due to its reduced stock price dragging down overall compensation. From early 2023, Meta’s stock rapidly recovered; employees’ issued with stock in 2022-2023 have seen its value multiple. From a total compensation point of view, it’s again hard to hire away from Meta:

Meta’s stock price since November 2022. Rising value means extra total compensation

We covered equity refresh targets per level in the US in Inside Meta’s engineering culture.

Four-year cliff

An event that frequently reduces compensation is the four-year vesting cliff, when the initial equity grant runs out at Big Tech. At senior engineer-and-above, and engineering-manager-and-above positions, these initial grants can be significant. It’s not uncommon for more equity to vest per year during the first four years of the initial grant vesting, than total compensation. The problem is that when this initial grant runs out, the compensation drops because the company does not “top up” with a similarly generous grant. This can mean a 10-40% drop in total compensation – pretty demoralizing!

As a manager, I dealt with the problem of engineers hitting 4 years’ tenure, and their annual earnings dropping 25-30%. The same happened to my own compensation package: in year 5 at Uber, I would have made about 30% less than in years 1-4, due to the initial equity grant running out, and lower annual refreshers. In the case of Uber, the stock price stayed relatively flat, and the drop in pay was the difference between revised compensation bands, and the equity which joiners had managed to negotiate.

Some Big Tech companies make the “cliff” less steep. Speaking with an engineering leader at Meta, they told me the annual refreshers offered at L6-and-above levels (staff engineer equivalent and above) are usually large enough to ensure no major compensation drop.

However, there are also companies like Amazon where only top performers receive top-up equity. This means that after four years, those without equity awards see a major compensation drop, as the compensation then only comprises salary, as Amazon doesn’t do cash bonuses. When this happens, it’s a signal that Amazon doesn’t particularly want to retain someone. It’s common for engineers to start applying externally when their equity is set to run out.

When a company’s stock price keeps increasing, the 4-year cliff becomes more painful. In Big Tech there are compensation targets for every engineering level. People earning above this target get very little or no equity refreshers, as they are already above target.

Going back to the example of NVIDIA, and the imaginary software engineer on $250K/year in 2020 ($150K salary, plus $100K/year stock), who’s on track to make $1.15M in 2024, thanks to NVIDIA’s stock price increase. That software engineer could see their compensation drop from $1.15M in 2024, to $150K in 2025, assuming no further equity refreshers. Even with an equity refresher of $400K over 4 years, their compensation will still drop from $1.15M in 2024 to $250K in 2025!

As a tech worker, it’s easy enough to rationalize that current compensation is outsized compared to other sectors; but you don’t need to be psychic to understand that a pay cut is demotivating; people are doing the same job as before for less money.

Assuming our engineer managed to save most of their gains from the incredible stock run, they might have a few million dollars in savings. This creates room for taking a risk, such as:

Joining another company for higher compensation (very small risk)

Joining a startup for lower compensation package but more equity (moderate risk)

Cofounding a startup, taking a steep cut on compensation, but a high equity stake (high risk)

7. Raw Feedback

The engineering leader who left Snowflake for becoming “too Big Tech” interviewed with several startups, and is in touch with peers still working in Big Tech. They share some unfiltered observations about people considering leaving big companies

Golden handcuffs

'Golden handcuffs' are a big thing at companies like Snowflake. I know plenty of people who are still riding out significant equity grants from the last few years that increased several times in value.

‘Salaries have stagnated across the industry, though. Back at Snowflake, we hired some people who were overpaid, compared to the current market. I know this because I hired some of them! We offered above the market because in 2021-2022 we were desperate to fill positions, like everyone else! 

‘This is the problem with golden handcuffs: when you are lucky enough to have them, it’s hard to find anywhere offering more because you’re already above the market bands! So the only way to avoid a compensation cut is to stay.

Hiring slowdown

‘I have seen a slowdown in hiring across the tech industry, mostly at bigger companies. It also impacted people at the “lower end” of experience and domain expertise. If you are a very experienced engineer or engineering leader, or have some specific skills/knowledge that is in demand, the market is good in 2024!

‘Non-listed companies are still hiring more than public ones. I’ve talked with a decent number of strongly-growing companies and most want to hire experienced people.’ This observation tallies with one from the deep dive in August, Surprise uptick in engineering recruitment

‘I’m an example of the demand for experienced people. I have not been actively looking for jobs – but out of curiosity, I made myself open to inbounds from recruiters on LinkedIn. In two months, I had interviews with engineering VPs for series C and D companies. I am actually going to NYC next week for a half-day onsite as the final step for one role with a series D. I haven't actually actively applied to any jobs while doing so!

Bifurcated market

‘The current job market seems to be divided into two parts:

Experienced folks: If you are a senior, experienced person, especially with in-demand skills, there are options and the market is still moving steadily, if a bit slower than before

Junior folks: if you are more junior, or don't have unique experiences or skill sets, you are probably not going to see many opportunities in the current market

Risk takers favored:

‘There are two types of people when it comes to taking risks:

Builders and risk takers: people who like to build and grow teams and programs, who like taking risks, and jumping into the unknown with a bit of chaos. I’m someone who thrives on that; I get bored easily!

Incremental improvers seeking stability. Many people like to run things and make incremental improvements, from one stable job to another stable job. 

‘In the current environment, big and stable companies are not hiring so much. So the people getting jobs are willing to take risks with less predictable companies, and jump into some chaotic situations.

Tech industry becoming ‘tiered’

‘An article by The Pragmatic Engineer covers the ‘tiering’ of the tech industry, which I experienced at first hand.

‘At my job before Snowflake, I was around “mid tier” at a financial software company. I would have been stuck in this “tier”, but got lucky in that Snowflake was desperate to hire tons of people in 2019.

Joining Snowflake immediately catapulted me into a much higher compensated group. Beforehand, I did not appreciate how massive the gap is between mid and top-tier companies! But I’m torn about this gap. On one hand, I really appreciate the compensation and career options. On the other hand, it irritates me how insular, incestuous, and hypocritical this is.

‘The upper tier literally feels like an old European aristocracy – and I’m saying this as someone who lives in the US! People help out their buddies, and are extremely suspicious of anyone not in their ‘club.’ It’s eye-opening to see how many people jump from company to company, taking their buddies with them. They all make lots of money, while keeping it exclusive and making sure it stays that way.’

Takeaways

Thank you to everyone who contributed to this look into why successful tech workers quit the most successful tech employers. When I joined Uber in 2016, it felt like the best-possible place I could have onboarded to. Back then, Uber had very positive media coverage, was called the most valuable startup in the world, and was the quickest to scale up in history. And yet, when I joined on the first 1:1 with my manager, the question I got from this was:

“So, what are you planning to do professionally after Uber?”

It was day one at the world’s most valuable startup; why was my manager asking about what I’ll do after this job? He later explained this question was because he’d been in the industry long enough to know that 99% of people don’t retire at their current company, and he wanted to be a supportive manager for future career goals. So if someone told him they might try to do a startup one day: he would try to get them involved in projects where they can do more zero-to-one building. If someone said they would like to get to a VP of engineering role at a scaleup later, he’d try to help them grow into a people manager. Everyone eventually leaves even the fastest-growing scaleups, or the most coveted Big Tech. 

A smaller group departs into retirement, more commonly at companies like Microsoft and Amazon, where some engineers spend decades. But most people leave for other companies.

I hope the half dozen accounts from tech professionals who left Big Tech provide a sense of why people decide the most prestigious workplaces in tech are not for them.

Working at Big Tech can make leaving it much easier. This is counterintuitive because Big Tech pays so well, and the biggest reason against leaving is the compensation cut – at least in the short-term. However, the high pay allows people to save up a nest egg much faster, which provides the financial freedom to do something more risky like joining a startup and betting that the equity package will grow in value, or just taking a pay cut to join a company with more interesting work, or which they are passionate about.

Some people never stop growing professionally. A common theme in these accounts is feeling stagnant; most people felt they weren’t growing or being challenged. Some left because of frustration about doing more administrative busywork and less building.

Working at Big Tech is often a final goal, but a job in this elite group of workplaces can also be a stepping stone for pursuing new ambitions. I hope these accounts shed some light on the decision-making process and serve as a reminder that engineering careers are also about the journey, not just the destination.

Monday, 14. October 2024

Doc Searls Weblog

What goes in these structured wiring cabinets?

I need to install gear in these two structured wiring cabinets in the garage of the new house we are finishing. I don’t know exactly what to put in them and seek advice. The installed cables are: Blue CAT-6a Ethernet cables go to outlets (RJ-45 jacks) in four rooms. Internet will come from the city’s […]
Two in-wall structured wiring cabinets in our new garage.

I need to install gear in these two structured wiring cabinets in the garage of the new house we are finishing. I don’t know exactly what to put in them and seek advice.

The installed cables are:

Blue CAT-6a Ethernet cables go to outlets (RJ-45 jacks) in four rooms. Internet will come from the city’s new fiber optic system. Coaxial cables go to four possible TV locations. They will carry signals from the over-the-air TV antenna that will go on a pole outside. We will not have cable TV or Internet service.

Soon to be installed are:

A coaxial cable from the TV antenna. A fiber cable from the street.

Both will come underground and up into the garage through a conduit between the back of the house and the hole (not visible) in the left side of the left cabinet.

So here is what I think I need:

A patch panel for the coaxial cables, so I can either distribute the TV signal or patch it through to one TV at a time. An ONT (optical network terminal) to serve as the router between the fiber from outside and the four ethernet connections inside.

Here is what I don’t know:

What the fiber provider (GigabitNow Bloomington) will provide. I know they will need to terminate fiber from the street with an ONT that will go in one of the cabinets, but I don’t know whether they will provide, or I will need to get, a way to distribute service to the four Ethernet lines. I see ONTs sold online that have four Ethernet ports, but I don’t know if the provider will welcome my getting one or not. Whether I will need an in-line TV signal amplifier. My antenna, which I’ve already built and tested, has an amplifier already. The question is whether I will need more amplification, especially if I am distributing to more than one room, and not just patching through one room at a time. Exactly what the best practices are for attaching devices to the inside of these cabinets.

So that’s it for now. Thanks for any input you’ve got.

 


@_Nat Zone

デジタルアイデンティティ:世界の最新動向 – 2024年10月15日

今週のレポートでは、世界中のデジタルアイデンティティに関する最新の動向を探ります。アメリカの新しいサイバーセキュリティツールから発展途上国のデジタルID計画まで、世界各国がさまざまな分野でデジタル変革を進めています。 アメリカ:SpyCloudの新しいサイバー犯罪調査ツール サイ…

今週のレポートでは、世界中のデジタルアイデンティティに関する最新の動向を探ります。アメリカの新しいサイバーセキュリティツールから発展途上国のデジタルID計画まで、世界各国がさまざまな分野でデジタル変革を進めています。

アメリカ:SpyCloudの新しいサイバー犯罪調査ツール

サイバーセキュリティソリューションのリーダーであるSpyCloudが、SpyCloud Investigationsポータル内に画期的な新機能を発表しました。IDLink高度分析と呼ばれるこの新機能は、サイバー犯罪調査にアイデンティティ分析を直接組み込むことを目的としています。

IDLinkの主な利点:

内部脅威分析の加速 サプライチェーンリスク評価の強化 脅威アクターの特定の改善

このツールは以下のチームにとってゲームチェンジャーになると期待されています:

サイバー脅威インテリジェンス(CTI)チーム セキュリティオペレーション 不正およびリスク防止アナリスト 法執行機関

IDLinkは、漏洩したアイデンティティデータと潜在的なリスクとの関連性を迅速に洞察することで、サイバー犯罪調査の効率性と有効性を大幅に向上させることを目指しています。

アメリカ:NISTによるYotiの顔年齢推定の認識

米国国立標準技術研究所(NIST)が、Yotiの顔年齢推定(FAE, facial age estimation)モデルの精度がすぐれたものであることを確認しました。このモデルは2つの重要なカテゴリーで注目すべきランキングを達成しました:

「子供のオンライン安全」カテゴリーにおける13〜16歳の精度で1位 18〜30歳の「マグショット」画像の精度で2位

Yotiは、携帯電話で撮影された自撮り写真に特化してFAEモデルを最適化したと報告しています。この最適化により、Yotiの内部テストデータを使用して測定した場合、大幅に高い精度が得られました。

NISTからの recognition は、特に若年ユーザーのオンライン安全対策を強化する上で、顔年齢推定技術の可能性を強調しています。

フィジー:国家デジタルID計画の進行

フィジーが野心的な国家デジタルIDプログラムの開発に着手しました。主要な詳細は以下の通りです:

予想タイムライン:完了まで少なくとも26ヶ月 リーダーシップ:以下のメンバーで構成される運営委員会 マノア・カミカミカ副首相 ビマン・プラサド教授(副首相) アリフ・アリ準備銀行総裁 その他の省庁の代表者

フィジー内閣は6月にこのプロジェクトを承認し、包括的な国家デジタルIDシステムの確立に向けて重要な一歩を踏み出しました。このシステムは政府と民間セクターの両方のニーズに対応することを目的としており、フィジー国民がさまざまなサービスや機関とやり取りする方法を変革する可能性があります。

コモロ:デジタル公共インフラプロジェクト

コモロ連合が、アフリカ開発銀行(AfDB)からの実質的な資金援助を受けて、デジタル公共インフラ(DPI)プロジェクトを開始する準備を整えています。資金調達とプロジェクトの詳細は以下の通りです:

総資金:951万ユーロ(約1040万米ドル) アフリカ開発基金から402万ユーロ 銀行の移行支援施設から549万ユーロ

プロジェクト名:「コモロ経済のデジタル化支援プロジェクト」

主な目的:

ガバナンスの改善 公共サービスの質、手頃さ、アクセシビリティの向上 デジタル政府システムの確立

具体的な成果物:

新しいデータセンターの建設と運用 既存の二次データセンターのアップグレード デジタルイノベーションのためのインキュベーターの創設

プロジェクトの総コストは2253万ユーロ(2500万米ドル)と推定されており、追加の資金源や将来の投資計画があることを示唆しています。

グローバル:AI駆動の身元詐欺の増加

Signicatによる2024年の報告書「AI駆動の身元詐欺との戦い」が、AI駆動の詐欺の増加に関する警告的な統計を明らかにしました:

金融・決済セクターで検出された詐欺の試みの42.5%がAIを利用 これらのAI駆動の試みの推定29%が成功

これらの数字は、詐欺師の手口の高度化と、金融業界における高度な不正検出・防止対策の緊急の必要性を浮き彫りにしています。

欧州連合:デジタル旅行資格イニシアチブ

欧州委員会が、シェンゲン圏の旅行文書のデジタル化に向けて重要な一歩を踏み出しました。パスポートとIDカードをデジタル化するための2つの提案が採択され、これらは総称して「EU Digital Travel application」と呼ばれています。このイニシアチブはEU市民と非EU市民の両方に適用されます。

提案の主な特徴:

デジタル旅行資格の使用に関する共通フレームワーク 旅行者がデジタル旅行資格を作成・保存するための新しい「EU Digital Travel application」

利点:

シェンゲン圏への、および圏内の旅行がより簡単で安全に 国境通過プロセスの現在の国境での物理的チェックからの合理化

実装:

アプリケーションは欧州委員会がeu-LISAの支援を受けて開発、EUレベルで提供されます 生体認証パスポートまたはEU IDカードを持つすべての旅行者が利用可能になります

このイニシアチブは、欧州における旅行文書のデジタルファースト化への大きな転換を示しており、他の地域にも先例を示す可能性があります。

キプロス:デジタルIDカードの展開

キプロスは、10万枚のデジタルIDカードの購入を承認し、デジタル化への取り組みを進めています。ニコデモス・ダミアヌ研究副大臣が以下の詳細を発表しました:

最初の3万枚は無料で配布 残りの7万枚は1枚15ユーロで発行 デジタルIDカードは欧州委員会の承認を受けている 市民はさまざまなデジタル政府サービスにアクセス可能になる

この動きは、キプロスがアイデンティティインフラを近代化し、より広範な欧州のデジタルアイデンティティイニシアチブに足並みを揃える決意を示しています。

フランス:デジタル医療カードの統合

フランスは、「carte Vitale」として知られる医療カードをデジタル化し、France Identitéアプリに統合する予定です。この開発の主なポイントは以下の通りです:

計画されている開始時期:2025年第1四半期 現在の機能:ユーザは、物理的なcarte Vitaleを提示することによって、ユーザーがフランスの医療システムにアクセスし、医療費の償還を受けることを可能にしています。 将来の機能:ユーザーはアプリを通じて医療カードをデジタルで提示可能になります。

France Identitéアプリには、すでに電子的な国民IDカードや運転免許証などの重要な文書が含まれています。この医療カードの追加により、フランス国民の個人識別情報とデータ管理がさらに一元化されます。Sopra Steria, Atos, iDAKTO and Idemia がフランスのデジタルアイデンティティスキームの技術を提供しています。

モルドバ:EUデジタルアイデンティティ標準への適合

モルドバは、デジタルID規制を欧州連合の標準に合わせる作業を進めており、特にEUデジタルアイデンティティ(EUDI)ウォレットとの互換性に焦点を当てています。この目標を達成するために:

モルドバはエストニアのソフトウェア会社Cyberneticaと提携 プロジェクトは10月に終了予定

この適合は、モルドバが欧州のデジタルアイデンティティエコシステムとの統合に取り組んでいることを示しており、国境を越えたデジタル相互作用をよりスムーズにする可能性があります。

カザフスタン:国家生体認証システム

カザフスタンは、遠隔生体認証のための包括的な国家システムの確立を進めています。このシステムは、国内のすべての産業分野の生体認証データを統一したデータベースとして機能することを目指しています。

関与する主要組織:

National Information Technologies JSC(Nitec)、カザフスタンの電子政府インフラ運営者 BTS Digital、カザフスタンのデジタルIDスタートアップ

このイニシアチブは、カザフスタンがアイデンティティインフラを近代化し、安全な認証を必要とするさまざまなサービスを合理化する野心を反映しています。

おわりに

これらの発展から分かるように、世界中の国々がデジタルアイデンティティと関連技術において大きな進歩を遂げています。サイバーセキュリティ対策の強化から旅行文書や医療システムのデジタル化まで、これらのイニシアチブは、個人が政府サービスを利用し、旅行し、ますますデジタル化する世界で個人情報を管理する方法を再形成しています。

(出所)https://www.thinkdigitalpartners.com/news/2024/10/14/digital-identity-global-roundup-187/

Sunday, 13. October 2024

Michael Ruminer

AI “its got legs”

Compliments of FLUX schnell This last week I spoke with two folks on AI intersection within their work and how their work intersects with one another. One was a CPO for a startup that is bringing AI into their life sciences products in a forefront way. The other was a compliance manager for a pharma company. The second happens to be my wife. For me, their work intersects. They both certainly g
Compliments of FLUX schnell

This last week I spoke with two folks on AI intersection within their work and how their work intersects with one another. One was a CPO for a startup that is bringing AI into their life sciences products in a forefront way. The other was a compliance manager for a pharma company. The second happens to be my wife. For me, their work intersects. They both certainly got me thinking harder about AI agents. The first of the two was breaking down knowledge silos in organizations and I definitely saw a play for AI agents and the use of tools by those agents in their approach. The second needed some exuberant discussion on the potential of how AI can contribute to their organizational goals and needs; and without mansplaining, I was glad to help. :-) The conversations only reinforced in me where we are headed in AI adoption.

There were discussions on trend analysis, small language models, retrieval-augmented generation (RAG), and how AI agents and tools work to bring different knowledge repositories into a single interface with human consumable outputs, just to name a few topics. Surprisingly or not, a lot of these topics were not with the CPO but with the wife; putting the theory into practical examples. It was motivational to me.

I have been looking for good, practical, and real-world examples of AI agenic tools to work on as knowledge-expanding exercises; whether they work or, in the end, surface the immaturity of the technology. I don’t get especially motivated by the common and tired examples of financial analysis, LinkedIn scraping for recruiting, or travel booking scenarios that every agent example seems to rely on. These discussions were a step forward.

These topics reinforced with me that for AI and its agents — “its got legs”.

Saturday, 12. October 2024

Heres Tom with the Weather

Thursday, 10. October 2024

Jon Udell

Mix Human Expertise With LLM Assistance for Easier Coding

There are plenty of ways to use LLMs ineffectively. For best results, lean into your own intelligence, experience, and creativity. Delegate the boring and routine stuff to closely supervised assistants whose work you can easily check. Mix Human Expertise With LLM Assistance for Easier Coding Part of the LLM series at The New Stack.

There are plenty of ways to use LLMs ineffectively. For best results, lean into your own intelligence, experience, and creativity. Delegate the boring and routine stuff to closely supervised assistants whose work you can easily check.

Mix Human Expertise With LLM Assistance for Easier Coding

Part of the LLM series at The New Stack.


The Pragmatic Engineer

The Pulse #110: VC-funded companies acting more like bootstrapped ones?

Also: first-ever double Nobel Prize wins for AI research, and an interesting cloud platform price comparison startup built on a budget

The Pulse is a series covering insights, patterns, and trends within Big Tech and startups. Notice an interesting event or trend? Send me a message.

Today, we cover:

Industry pulse. Google to be broken up? NVIDIA may compete with cloud providers, Twilio commits to full-remote work, third-party app stores coming to Android, and more.

VC-funded companies being more bootstrapped? More seed-stage VC-funded companies aim to become profitable fast in order to avoid future fund raising rounds. It’s like how bootstrapped companies operate, and reduced levels of available funding will keep this approach popular.

First-ever double Nobel Prize wins for AI research. Both the physics and chemistry Nobel prizes have been awarded for AI-related work. And three of the five prize winners have Google connections.

Interesting startup idea: benchmarking cloud platform pricing. A team of 3 developers built a neat platform called Spare Cores that makes cloud instance pricing more transparent. They currently track 275,000 up-to-date server prices across 4 cloud providers – and built all of this on top of a €150K ($165K) EU innovation grant in ten months.

1. Industry pulse Google to be broken up?

In August, the US Department of Justice (DOJ) ruled that Google broke antitrust law by paying other vendors to be the default search engine. Now, the process moves on to decide the penalty. The Department of Justice (DOJ) is suggesting the break-up of the search giant, in order to end the company’s monopoly in search. 

The DOJ is seeking “behavioral and structural remedies that would prevent Google from using products such as Chrome, Play, and Android to advantage Google search and Google search-related products and features.”

It is down to the judge to decide the penalty, and whether to take the drastic decision to break up the tech giant, in the way the US government broke up oil companies in the early 20th century. The last time it happened to a Big Tech company was in 2000, to Microsoft. Back then, the business appealed and got the decision overturned in 2001.

It reads like the DOJ will push for Search to be a separate company, and products like Chrome, Play and Android to operate in a unit independently. It is a fair question if products like Chrome would be viable without being subsidized by the profitable ads business. Microsoft and Apple both subsidize their browser development from other parts of their businesses. 

If Chrome, Android, and Play became a separate entity, this entity could feature search engines like Google, Bing, DuckDuckGo and other search businesses, could pay to be the default search engine for a limited time, or for a region.

Google will no doubt fight hard and do whatever it can to avoid being broken up.

NVIDIA to compete with AWS, GCP, and Azure on GPUs?

Read more


Werdmüller on Medium

It turns out I’m still excited about the web

The tech industry has gone down a dark path, but there are glimmers of hope. Continue reading on Medium »

The tech industry has gone down a dark path, but there are glimmers of hope.

Continue reading on Medium »


@_Nat Zone

情報セキュリティワークショップ in 越後湯沢 2024での講演のスライドです

本日(10月10日)、情報セキュリティワークショップ in 越後湯沢で「ID管理の死角:なぜ脅威は減らないのか?」と題して講演を行わさせていただきました。 その最終販のスライドです。 よろしくご査収ください

本日(10月10日)、情報セキュリティワークショップ in 越後湯沢で「ID管理の死角:なぜ脅威は減らないのか?」と題して講演を行わさせていただきました。

その最終販のスライドです。

よろしくご査収ください

2024-10-10_ID管理の死角

Wednesday, 09. October 2024

Justin Richer

GNAP: A Conversation of Authorization

After five years of standardization work, GNAP is now officially RFC9635! This long and intense process actually started a few years prior to that, when I was talking with a lot of folks in the security industry about some of the shortcomings of OAuth 2.0, and what we could do about them as an industry. These conversations led to the XYZ proposal (and implementations) which eventually led to the f

After five years of standardization work, GNAP is now officially RFC9635! This long and intense process actually started a few years prior to that, when I was talking with a lot of folks in the security industry about some of the shortcomings of OAuth 2.0, and what we could do about them as an industry. These conversations led to the XYZ proposal (and implementations) which eventually led to the formation of the GNAP working group along with a bunch of others. In particular, the work that Fabien Imbault, Yaron Sheffer, Leif Johannsen, and Aaron Parecki put into the documents and conversations in the working group over these years.

I’m really proud of what we’ve built in GNAP. One of the core tenets of GNAP was to look at the world of OAuth and surrounding technologies and figure out how we could do a lot of that better. It’s been great to see GNAP getting applied in a bunch of places over the web, from payments to key management, and especially in places where OAuth doesn’t reach as well. While OAuth remains deeply entrenched over the world, and likely will be for some time, the community has learned many things from GNAP. Alot of things that started in GNAP have been making their way back to the OAuth ecosystem in some form.

The most obvious of this is RFC9396: OAuth Rich Authorization Requests. This replacement of OAuth’s scope parameter was a direct and intentional backport of what became GNAP’s resource access rights, which also acronyms to RAR. In the OAuth world, we don’t get some of the clean features of GNAP, like being able to substitute strings for objects as a shorthand, but a lot of the core enhancements are there.

We’re also seeing yet another intent registration addition to OAuth 2 (on top of the pushed authorization request, device grant type, and CIBA extensions), and this one mimics a lot of the flexibility of GNAP’s interaction system. It’s a more narrow use case in the OAuth specification, but it’s clear that the pattern that GNAP was built on is here to stay.

And then there’s RFC9421: HTTP Message Signatures. This is work that started independently from GNAP but grew up around the same time, and GNAP utilizes HTTP Message Signatures as a core security function. I don’t think we’d have gotten the signing spec to be as robust as it is without some of the GNAP key proofing use cases driving the discussion.

And finally, the GNAP Resource Servers document has just passed IESG review and is on its own way to becoming an RFC as well. This document represents key abstractions in how and RS and AS relate to each other, and I hope we can continue to build this out and pull the best ideas out into the world.

The GNAP working group is shutting down now that its core work is done, but GNAP is far from over. I look forward to seeing it grow into its spaces, and serve as a beacon of how a delegation protocol can be engineered and built.


Heres Tom with the Weather

We fix the fucking networks

“A lot of us remember what it was like to live and work on an Internet that was deeply flawed but not systematically designed to burn our emotions and time and safety for fuel.”

“A lot of us remember what it was like to live and work on an Internet that was deeply flawed but not systematically designed to burn our emotions and time and safety for fuel.”


Just a Theory

PGXN v2 Update

A lot has happened in he five months since the last PGXN v2 update. The time has been used for continuing community discussions, planning, designs, and the start of implementation. Read on for a full accounting.

Speaking of PGXN news, I neglected to link to this post I wrote for the Tembo Blog last month, a fairly detailed accounting of what’s been happening on the PGXN v2 project:

Forgive me Postgres community, for it has been five months since my last PGXN v2 Update. In my defense, it has been super busy! The time went into ongoing community discussions, planning, designs, and the start of implementation. Join me below for the lowdown.

A few highlights:

PGXN RFCs Repository and rfcs.pgxn.org Binary Distributution POC and OCI POC Extension Ecosystem Summit API and ABI guidance pgxn_meta v0.1.0 PGXN Meta JSON Schemas project plan

There’s been quite a bit of activity since then, including the aforementioned PGXN RFC–5 — Release Certification. More soon!

More about… PGXN

PGXN Certifications RFC

A request for comments on a new PGXN RFC for signing releases, plus a link to an initial implementation.

A couple weeks ago, I drafted PGXN RFC–5 — Release Certification, which proposes to replace the simple inclusion of a SHA-1 hash digests in PGXN release META.json files with a JWS-signed release payload. From the introduction:

This RFC therefore proposes to extend v2 distribution metadata with a single additional property, certs, that contains one or more certifications that attest to the authenticity or other characteristics of a release on PGXN.

The certs value is an object that contains at least one property, pgxn, which itself contains a PGXN-generated RFC 7515 JSON Web Signature in the JWS JSON Serialization format. The pgxn property will allow clients not only to assemble the release URL and verify the downloaded file against checksums, but also validate it against a public key provided by PGXN.

The design allows multiple signatures, certifications, or other attestations, which in the future MAY allow authors or other entities to sign releases with their own keys. The new format appends a structure such as this to the distribution META.json file:

{ "certs": { "pgxn": { "payload": "eyJ1c2VyIjoidGhlb3J5IiwiZGF0ZSI6IjIwMjQtMDktMTNUMTc6MzI6NTVaIiwidXJpIjoiZGlzdC9wYWlyLzAuMS43L3BhaXItMC4xLjcuemlwIiwiZGlnZXN0cyI6eyJzaGE1MTIiOiJiMzUzYjVhODJiM2I1NGU5NWY0YTI4NTllN2EyYmQwNjQ4YWJjYjM1YTdjMzYxMmIxMjZjMmM3NTQzOGZjMmY4ZThlZTFmMTllNjFmMzBmYTU0ZDdiYjY0YmNmMjE3ZWQxMjY0NzIyYjQ5N2JjYjYxM2Y4MmQ3ODc1MTUxNWI2NyJ9fQ", "signature": "cC4hiUPoj9Eetdgtv3hF80EGrhuB__dzERat0XF9g2VtQgr9PJbu3XOiZj5RZmh7AAuHIm4Bh-rLIARNPvkSjtQBMHlb1L07Qe7K0GarZRmB_eSN9383LcOLn6_dO--xi12jzDwusC-eOkHWEsqtFZESc6BfI7noOPqvhJ1phCnvWh6IeYI2w9QOYEUipUTI8np6LbgGY9Fs98rqVt5AXLIhWkWywlVmtVrBp0igcN_IoypGlUPQGe77Rw" } } }

Review and feedback would be very much appreciated, especially on the list of unresolved questions toward the end.

Thanks to David Christensen and Steven Miller for the early reviews!

Meanwhile, I’ve released pgxn_meta v0.4.0, which adds support for this format, as well as code to rewrite PGXN v1 release fields to the new format. It doesn’t actually do signature verification, yet, as the server back end hasn’t been updated with the pattern and PKI. But I expect to modify it in response to feedback and get it implemented in early 2025.

More about… Postgres PGXN RFC JWS

The Pragmatic Engineer

Efficient scaleups in 2024 vs 2021: Sourcegraph (with CEO & Co-founder Quinn Slack)

On today’s episode of The Pragmatic Engineer, I’m joined by Quinn Slack, CEO and co-founder of Sourcegraph, a leading code search and intelligence platform.

The Pragmatic Engineer Podcast had a very warm reception with the first episode — thank you! Two weeks later, here is episode #2, with a CEO who still codes every day: Quinn Slack at Sourcegraph.

Listen now on Apple, Spotify, and YouTube.

Brought to you by:

Paragon: ​​Build native, customer-facing SaaS integrations 7x faster.

Enterprise Ready Conferfence on 30 October: a one-day event in SF for product and engineering leaders shaping the future of enterprise SaaS.

On today’s episode of The Pragmatic Engineer, I’m joined by Quinn Slack, CEO and co-founder of Sourcegraph, a leading code search and intelligence platform. Quinn holds a degree in Computer Science from Stanford and is deeply passionate about coding: to the point that he still codes every day! He also serves on the board of Hack Club, a national nonprofit dedicated to bringing coding clubs to high schools nationwide. In this insightful conversation, we discuss:            

How Sourcegraph's operations have evolved since 2021

Why more software engineers should focus on delivering business value

Why Quinn continues to code every day, even as a CEO

Practical AI and LLM use cases and a phased approach to their adoption

The story behind Job Fairs at Sourcegraph and why it’s no longer in use

Quinn’s leadership style and his focus on customers and product excellence

The shift from location-independent pay to zone-based pay at Sourcegraph

And much more!

Takeaways

Some of my main takeaways from this conversation are these.

1. As software engineers, it’s increasingly important to understand what value you add to the business. A big difference between 2021 and 2024 is how companies are much more focused on efficiency: meaning they are hiring more conservatively, and less likely to fund teams with headcount that don’t contribute to the core focus of the company.

As a developer or manager, try to figure out how much your team contributes in revenue, or savings, or other key goals to the company. Are you working in what the company would consider as a profit center, or what is more as a cost center? We did a deepdive on this topic in The Pragmatic Engineer: check out the article, linked in the show notes.

2. AI tools are great to eliminate the toil that we, developers face, day-to-day. There are AI tools that position themselves as their goal being “replacing developers.” I found it sympathetic that Quinn did not think this is the sensible path. His approach is to start by using AI tools with some of the “dumbest things” like generating the changelog for a software release – I mean, assuming you generate a changelog. And then you take tedious tasks where these tools could help, and see if you can automate some more.

Do this one step at a time: and it will actually help devs and teams, and it’s a lot more achievable than saying “let’s replace this whole complicated workflow with AI.”

3. The reality of location independent pay is that it stops being sensible above a certain company size. Sourcegraph was one of the few companies that offered the same base salary regardless of where people worked at. They did this until they grew to about 200 people, and switched this model to a location-indexed model.

Quinn was honest about why they did it: because keeping this would have not made sense for the company, from the business point of view. Basically, location-independent pay means the company can hire very easily in low-cost regions, but it’s hard or impossible to do this in high-cost regions. It also creates the incentive for employees to move to a low cost region where they can save more. In the end, I don’t know of any company with more than 200 people that pays location-independent: all large companies have some kind of indexing on location, and the best companies just pay the top of the local market. We cover more about compensation in the deepdive on The trimodal nature of software engineering salaries.

If you enjoy the podcast, I’d very much appreciate if you subscribe on your favorite podcast player, and leave a review. It helps the podcast be discovered by more people. Thank you!

Show notes

Where to find Quinn Slack:

• X: https://x.com/sqs

• LinkedIn: https://www.linkedin.com/in/quinnslack/

• Website: https://slack.org/

In this episode, we cover:

(01:35) How Sourcegraph started and how it has evolved over the past 11 years

(04:14) How scale-ups have changed 

(08:27) Learnings from 2021 and how Sourcegraph’s operations have streamlined

(15:22) Why Quinn is for gradual increases in automation and other thoughts on AI

(18:10) The importance of changelogs

(19:14) Keeping AI accountable and possible future use cases 

(22:29) Current limitations of AI

(25:08) Why early adopters of AI coding tools have an advantage 

(27:38) Why AI is not yet capable of understanding existing codebases 

(31:53) Changes at Sourcegraph since the deep dive on The Pragmatic Engineer blog

(40:14) The importance of transparency and understanding the different forms of compensation

(40:22) Why Sourcegraph shifted to zone-based pay

(47:15) The journey from engineer to CEO

(53:28) A comparison of a typical week 11 years ago vs. now

(59:20) Rapid fire round

The Pragmatic Engineer deepdives relevant for this episode:

• Inside Sourcegraph’s engineering culture: Part 1 https://newsletter.pragmaticengineer.com/p/inside-sourcegraphs-engineering-culture
• Inside Sourcegraph’s engineering culture: Part 2 https://newsletter.pragmaticengineer.com/p/inside-sourcegraphs-engineering-culture-part-2

Referenced:

• Sourcegraph: https://sourcegraph.com/

• SharePoint: https://www.microsoft.com/en-us/microsoft-365/sharepoint/collaboration

• Google Code Search: https://developers.google.com/code-search

• Steve Yegge on LinkedIn: https://www.linkedin.com/in/steveyegge

• Shopify: https://www.shopify.com/

• Tobi Lutke still writes code for Shopify: https://news.ycombinator.com/item?id=25003574

• Gitlab: https://about.gitlab.com/

• Scott Kirby on LinkedIn: https://www.linkedin.com/in/jscott-kirby/

• Beyang Liu on LinkedIn: https://www.linkedin.com/in/beyang-liu/

• Perl: https://www.perl.org/

• Booking.com: https://www.booking.com/

• Loom: https://www.loom.com/

• CleanShot: https://cleanshot.com/

• Kazam: https://launchpad.net/kazam

• Ollama: https://ollama.com/

• StarCoder: https://huggingface.co/blog/starcoder

• Llama: https://www.llama.com/

• The Years of Lyndon Johnson: The Path to Power; Means of Ascent; Master of the Senate; The Passage of Power: https://www.amazon.com/Robert-Caros-Years-Lyndon-Johnson/dp/038535147X

• The Life Of Mahatma Gandhi: https://www.amazon.com/Life-Mahatma-Gandhi-Louis-Fischer/dp/1784700401

• Nimitz at War: Command Leadership from Pearl Harbor to Tokyo Bay: https://www.amazon.com/Nimitz-War-Command-Leadership-Harbor/dp/0190062363/r

Production and marketing by https://penname.co/. For inquiries about sponsoring the podcast, email podcast@pragmaticengineer.com.


Just a Theory

⛰️ Postgres Ecosystem Summit EU

The sequel to the successful PGConf.dev event, the Extension Ecosystem Summit EU will showcases some exemplary extension use cases.

Given the success of the Extension Ecosystem Summit at PGConf.dev back in May, my colleague Floor Drees has organized a sequel, the Extension Ecosystem Summit EU on Tuesday, October 22, at the Divani Caravel Hotel in Athens. That’s “Day 0” at the same hotel as PGConf.eu. Tembo, Percona, Xata, and Timescale co-sponsor.

While the May event took the form of an open-space technology (OST)-style unconference aimed at extension developers, the EU event aims to inform an audience of Postgres users about the history and some exemplary use cases for extensions. From the invite:

Join us for a gathering to explore the current state and future of Postgres extension development, packaging, and distribution. Bring your skills and your devices and start contributing to tooling underpinning many large Postgres installations.

Jimmy Angelakos - pg_statviz: pg_statviz is a minimalist extension and utility pair for time series analysis and visualization of PostgreSQL internal statistics. Adam Hendel (Tembo) - pgmq: pgmq is a lightweight message queue. Like AWS SQS and RSMQ but on Postgres. Adam is pgmq’s maintainer since 2023, and will present a journey from pure Rust → pgrx → pl/pgsql. Alastair Turner (Percona) - pg_tde: pg_tde offers transparent encryption of table contents at rest, through a Table Access Method extension. Percona has developed pg_tde to deliver the benefits of encryption at rest without requiring intrusive changes to the Postgres core. Gülçin Yıldırım Jelínek (Xata) - pgzx: pgzx is a library for developing PostgreSQL extensions written in Zig. Mats Kindahl (Timescale) - TimescaleDB (C), [pgvectorscale] (Rust) and pgai (Python): maintaining extensions written in different languages.

I will also deliver the opening remarks, including a brief history of Postgres extensibility. Please join us if you’re in the area or planning to attend PGConf.eu. See you there!

More about… Postgres PGXN Extensions PGConf Athens Summit

Tuesday, 08. October 2024

The Pragmatic Engineer

What is Reliability Engineering?

A history of SRE practice and where it stands today, plus advice on working with reliability engineers, as a software engineer. A guest post by SRE expert and former Googler, Dave O’Connor

Hi, this is Gergely with a subscriber-only issue of the Pragmatic Engineer Newsletter. In every issue, I cover challenges at Big Tech and startups through the lens of engineering managers and senior engineers. To get articles like this in your inbox, every week, subscribe:

Subscribe now

For software engineers, the job involves more than just building software systems; these systems must also be reliable. This is easy enough for a website with a small number of visitors; but the larger the system gets, the trickier reliability is to achieve. There’s a huge amount of complexity involved in making an app or website with tens, or hundreds, of millions of daily users work reliably for (almost) all of them.

Google pioneered the concept of Site Reliability Engineering (SRE), and it has become a pretty mainstream discipline with many mid size-and-above tech companies having dedicated SRE or reliability teams. To find out more about SRE and reliability engineering in general, I reached out to SRE veteran, Dave O’Connor. He was at Google in 2004 – working with the team, from where the SRE discipline emerged just a year before, in 2003.

Today, Dave covers:

Common terms. Why it’s called ‘reliability engineering’ instead of SRE, DevOps teams, etc.

History. From conception in 2003 at Google, to industry convergence in the 2010s.

Reliability engineering today. This is a discipline in growth mode. But near-perfect reliability increases cost exponentially, and most companies should not copy Google’s approach to the SRE role.

Four promises. A well-run reliability team sets and delivers on four promises: SLA/SLO/SLI, service substrate management, tech design involvement, and tactical heroism.

Working with reliability engineers. If there are reliability engineers in your workplace, find out their approach to the four promises, involve them in tech design discussions, and seek their help with reliability challenges.

Future of reliability engineering. Tooling is evolving, and a trend of companies considering quitting the cloud and being more opinionated about reliability practice.

Further reading. Article and book recommendations.

For related reading, see also these The Pragmatic Engineer deepdives:

Shipping to production

Healthy oncall practices

What is Security Engineering?

What is Data Engineering?

What is ML Engineering?

With this, it’s over to Dave.

Hi, I’m Dave. I’ve been a site reliability engineer (SRE) for 20 years, before many folks outside the Google ecosystem called it that. I joined the company in 2004, on a team tasked with (re)installing and (re)configuring machines in the fleet. 

We quickly realized that due to sheer numbers, it was not a job that could be done by humans at the then-current scale, let alone at the scale expected. At the time, the common capability to run and manage more than a few hundred machines simply didn’t exist. Then began the chartering of what became known as ‘site reliability engineering’ at Google. The rest, as they say, is history.

Several years later, that history started to be told in Site Reliability Engineering (I contributed chapter 29), and various publications thereafter. 

Since leaving Google in 2021 after 17 years as an SRE, I’ve led SRE and production groups at Elastic and Twilio, and I’m currently freelancing as a leadership practitioner for busy teams (SREs included), and as a coach for engineering leaders, focusing on reliability engineers. Check out my consulting services, and coaching practice.

1. Common terms

I use the term ‘reliability engineering’ in this article. This is because as a set of practices, it stands on its own and can be implemented via specialized ‘SREs’, the ‘DevOps’ model, or individually as software is built. Many publications go to great lengths to make this distinction, and the question of whether reliability should be done by specialized SREs, or everyone, is a resounding ‘it depends’. See my article 6 Reasons You Don’t Need an SRE Team for why I believe many companies don’t need a dedicated function.

As with any engineering specialization, anybody can do reliability engineering up to a point. The decision on hiring or building a dedicated SRE group is similar to the choice of whether to hire a dedicated QA, or an information security group. Does the business care enough about the outcomes to dedicate people and organizational headspace to it? Often, the answer is initially “no”. However, almost every enterprise encounters novel and domain-specific engineering challenges at some point, and in these situations dedicated SREs provide better outcomes, tailored to individual needs.

“Site reliability” as a name was coined by Google. The “site” in question was google.com. However, the role has grown since; for many years at Google I led the SRE group in charge of all the storage and databases. These were not ‘sites’ per se, but the name had stuck by that point. 

As with most nascent engineering functions, folks who do reliability engineering go by many titles:

SREs

Production engineers

DevOps

Platform engineering

DevSecOps

Platform trust

 Infrastructure engineering 

… and other verbal gymnastics! 

Reliability engineering goes by a lot of titles, the best-known being SRE

These titles all encompass pretty common practices. It’s also argued – correctly, if you ask me – that reliability engineering predates what Google did. Many of its common-sense or even specialized practices are taken from other disciplines. For example, the concept of the incident commander role for large-scale outages derives from the US Federal Emergency Management Agency (FEMA), founded in 1979. 

2. History

As computing has evolved and gone increasingly online, our needs have scaled beyond the capability of humans to perform tasks, or even understand the scale at which we find ourselves doing things. When there are 100,000+ machines, it’s impossible to eyeball a graph of all data points like servers or machines. We can not take a well-understood administrative action on a machine if we have to replicate it 100,000 times.

In the early 2000s, many enterprises went from a manageable number of physical computers to large numbers of servers. These servers performed one function and were clustered, or they could take more generic workloads. 

As Gavin McCance, Compute Services lead at CERN, put it, we started thinking of individual machines as cattle, not pets:

Transition from ‘cats’ to ‘cattle’. Source: CERN Data Centre Evolution by Gavin McCance

In the early days, the frame of reference for how many “a lot” of machines was, shifted almost monthly. At Google in around 2006, I went on a site visit to another large tech employer in Ireland, during which our group peered through a small window into “the largest datacenter in Ireland”, numbering thousands of machines. Cue some raised eyebrows and polite ‘ahems’ because we knew we had a room with twice as many machines, located half an hour’s drive away. 

Google’s scaling ability lay in being able to assemble and power machines, but also in being able to pour concrete and purchase supporting equipment like generators, when supply chains simply weren’t set up for our scale. It represented an enormous uptick in the amount of real, difficult, and unsolved engineering problems in delivering services. 

For 99% of enterprises, this kind of massive scaling up in servers isn’t (nor should be) a core competency. Hence the explosion in cloud computing because amassing the human expertise to understand and run all layers in the stack is far beyond the purview of most businesses. 

The early SRE role

At places like Google, it made sense to build the expertise and technology to cover all layers of the stack, from the frontend serving infrastructure and network links, back to the physical machines and power infrastructure on the ground. This was for reasons that can be linked back to one thing: scale.

When I joined the tech giant the number of physical machines was in the process of sailing comfortably into six digits, crossing 100,000 and growing fast. By comparison, at most enterprises, a couple of hundred machines was considered a pretty large footprint.

This had two major forcing functions, both related to scale:

Tooling at the time was simply not set up to handle the scale at which Google needed to operate. There were no datacenter-scale tools (where ‘datacenter’ now means many thousands of machines) that could install, configure, or administer machines at our scale. At the control plane level, there were no job or workload management systems suited to the task. Machine virtualization was already in use in many folks’ production setups, but Containerisation, Kubenetes, Terraform, Prometheus, and many other familiar components didn’t exist yet. In fact, Kubernetes and Prometheus are based, in part, on later internal Google tools. 

Also basically non-existent were SaaS options such as PagerDuty, and observability tools like Datadog and Honeycomb. 

It would take entire buildings and campuses full of people to manually operate machines. We were capable of hiring a lot of people, but we knew that without a giant shift in the technology and tooling, the tools and ways to operate machines weren’t going to work. For places like Google, this meant we’d quickly run out of people to be able to deal with the complexity and workload. We needed toolchains and practices beyond what the industry could provide. 

An additional forcing function for us was Google’s leadership’s almost fanatical desire for reliability and speed. Larry, in particular, cared a lot. Just as Gmail was launching and offering users an previously-unthinkable entire gigabyte of email storage, we were aiming for levels of precision and speed in serving content that were unheard of in most industries.

The fervent belief of Google’s founders was that speed and reliability mattered more than features. This belief was coupled with the understanding that we couldn’t achieve it traditionally, which made it an existential issue. The level of investment in building out all layers of the serving stack was a case of “because we can”, but also “because we have to, as nowhere else does what we need”.

There was never a question of whether traditional ‘ops’ would work at Google. We needed a specialized role, staffed by folks familiar with the problem space and engineering methods required to make it work. 

In 2003, the SRE role was born. Ben Treynor Sloss had been tasked with building Google’s “production team” and in his own words, he built “what happens when you ask a software engineer to design an operations team.” This turned into the birth of the SRE function at Google. From the outset, SRE was staffed in varying measures by systems/operations experts and software engineers. A large part of the remit of the team was to build the tools and practices required to operate Google’s fleet.

I joined as one of the first non-US SREs, based in Ireland. My background is in systems administration, and my previous employer’s machine fleet numbered somewhere in the high double-digits. I was one of the newer, specialized breed of “sysadmins who code”. We didn’t have a snappy name, but did have the drive to embody the three virtues of ‘laziness, impatience and hubris.’ 

When I joined, my first gig was ‘babysitting’ Gmail’s machine fleet. Basically, the job was to ensure there were enough machines to serve storage and serving needs, and to juggle decisions on waiting for tools and processes to catch up, or building them. In particular, many practices for working in distributed teams containing up to nine time zones, came from the early experience of collaborating with our SRE and product development counterparts in Mountain View and other offices.

Industry Convergence

Eventually, other companies caught onto the scaling issues, especially the hyperscalers. Each had their own approach, but over time, the notion grew industry-wide that making things reliable was a real-life engineering discipline, not simply ‘ops’. 

This step saw a number of terms coined to describe this engineering, including ‘DevOps’. At its core, this was the notion that the disciplines and practices of reliability engineering should be ingrained into the overall engineering organization. At places other than Google, this mostly took the form of combined developer/operations roles (i.e. “you build it, you run it”), which differed from Google’s implementation, but the practices were similar.

Around this time, Google started opening up about SRE, eventually publishing the first SRE book, and follow ups. Conferences such as USENIX SRECon, Devops Days, and other movements have solidified reliability engineering as a discipline that scales well beyond Google. Indeed, the company has become a consumer of many state-of-the-art developments.

3. Reliability Engineering Today

Reliability engineering is still in its growth and adoption phase. Unreliable software and systems which are slow or function incorrectly, are no longer tolerated by businesses and direct consumers. Fast, reliable internet access is becoming ubiquitous, and the services people use must be the same.

But aiming for near-perfect reliability scales costs exponentially. It’s estimated, based on experiences at AWS, that every “nine” of additional guaranteed availability (the difference between 99%, 99.9% and 99.99% uptime) scales overall costs by roughly ten times. This includes staffing, development and testing costs, and may only partially account for the opportunity costs of a necessarily slower release cycle. But slower release cycles aren’t for everyone! If you’re in a product space which can’t tolerate spending too much time and energy on testing and resiliency, the right answer may well be to aim lower.

It’s sensible practice for any organization to explicitly state how much they care about reliability. Know that it is not a race to the top: be realistic about balancing the price you’re willing to pay, with the reliability the business needs! 

In the end, business outcomes win. I have been in several versions of the same meeting where a product owner demands a certain number of ‘nines’ of availability, but when probed on what the business outcome of falling below this target is, they don’t really have an answer. This especially applies to non-serving systems and data pipelines, which can be behind in processing by several hours with no ill effects. However, it’s often seen as easy or necessary to demand to-the-minute SLAs from the outset, without reference to the ‘North Star’ of business needs.

As in disciplines such as security, there is a tradeoff. The downside when things go wrong is bad, but we don’t have carte blanche to do absolutely everything for extra reliability. It may be possible to neglect these areas entirely and risk the enormous downside of a security incident or outage, or to pursue reliability goals at the expense of the core mission.

Try to avoid implementing what Google does for its SRE practice. One of the authors of the original SRE Book, Niall Murphy, famously tore up a copy of it during a keynote at SRECon in 2022. But far from disavowing the book’s content, he was sounding a note of caution about copying what Google does, wholesale.

Absorbing Google’s approach effectively is not about copying as much of it as possible, it’s about discovering which elements make sense for you and what you’re building.

Since departing Google, I’ve worked with more than one company with its own “SRE Book Club”. This is great because there’s a lot of knowledge contained therein. However, I never saw rooms full of database admins poring page-by-page over database design publications and figuring out which features to include wholesale in their own setup, and this definitely applies to a modern SRE practice. The Google model of building everything yourself is just one way. It worked in the 2000s, but likely wouldn’t work today. 

The availability of technology and products that didn’t exist when Google was building SRE makes the tradeoffs a lot more understandable and explicit, in hindsight. I go into more detail on this topic in my article, “A Short History”.

4. Four promises

At its core, reliability engineering is this:

Read more


Michael Ruminer

Sometimes It’s All About The Chunking

As I continue my study and experimentation with coding up AI solutions, and especially, at the moment, with Retrieval-Augmented Generation (RAG), I decided to work with a post from the Metadocs blog titled “Simple Agentic RAG for Multi Vector stores with LangChain and LangGraph”. It seemed it would cover two areas of interest, agentic operations and RAG. Little did I expect to learn a valuable les

As I continue my study and experimentation with coding up AI solutions, and especially, at the moment, with Retrieval-Augmented Generation (RAG), I decided to work with a post from the Metadocs blog titled “Simple Agentic RAG for Multi Vector stores with LangChain and LangGraph”. It seemed it would cover two areas of interest, agentic operations and RAG. Little did I expect to learn a valuable lesson in chunking. In this post I’ll pass along the obvious but well demonstrated lesson I gathered in the process.

It started with the prior mentioned post that referred to a prior post that it built upon. Following a link to that pre-requisite post, “Create a Langchain app with multiple vector store the easy way”, it in turn referenced an even earlier post as a pre-requisite. So down that rabbit hole I went. The earlier post was “Deploy a RAG app with Langchain in minutes”. I read the earliest of the three posts and it was a very simple RAG application. I coded it up ensuring I understood each line along the way. Most notable was that it was splitting the text into chunks on “\n\n”. I hadn’t looked at the source document they were providing as a sample. It turns out it was a text file of a US State of the Union address with a twist. Each sentence was followed by “\n\n” (two line feed carriage returns); an interesting if unrealistic formatting. I have my own example PDF that I have been using for testing out various RAG implementations and how it handles a specific prompt, so I copied two paragraphs from the document that contains the context I was after and formatted it with a “\n\n” after each sentence. Normally, I have been extracting the text from the PDF as part of the code and then chunking usually with recursive character text splitting, but I didn’t want to change this program since I was going to build on it. When done the results actually returned what I was after, a list of the 10 principles of SSI.

For no apparent reason, I decided to edit the text file and format with a single “\n” after each paragraph. This should return the same results if I edited the text split to represent this. It didn’t. I was, and still am, perplexed by this. It makes no sense that a double line feed split should return any different results than a single line feed results under the circumstances. I plan to revisit this as I believe I must be wrong despite trying multiple times. What was most important in the process was that with the right chunking, as simple as it was, I got the desired results when prompted whereas in all my past experiences it had failed. There was a difference in that I was reading a text file versus using a PDF text extractor and that I was using only two paragraphs focused on the context I wanted versus 70 pages of an academic paper that is probably very hard to extract even if the proper area of context was two clean paragraphs of text within that PDF. The real lesson for me is showing how chunking is so important. I suspect the major differential in the success was in the chunk divisions. I won’t rule out the simpler PDF as a contributor.

Next, I plan to try a few things in this naive RAG implementation before I move on to the multi-vector store — a PDF creation of just the two paragraphs that contain the needed context and split on paragraphs (\n) to see how that comes out. I’ll try the two paragraphs in a PDF with text extracted and chunked using RecursiveCharacterTextSplitter and separators=[“\n\n”, “\n”, “. “, “ “, “”] and a chunk size of 1000 and two different settings for overlap (0, and 200) as well as with SentenceTransformerEmbeddingFunction and then the default OpenAi embedding function. Let’s see how all those combinations work.

To recap, though I can’t explain why I got wildly different results depending on the parsing character used with the tex file format changed, I can suspect the simple chunking by sentence made a lot of difference. The other likely impacting result was clean simple text versus PDF-extracted text. I plan to experiment more and will report back the results. A take away for me, even if it was not entirely or even primarily the impactful element, was how important the chunking is for good results.


@_Nat Zone

NIST SP800-63B-4 2pdのパスワード基準は2017年から踏襲されている〜新しいのはフィッシング耐性の要求

なんか、「定期的に変更するな」 NISTがパスワードポリシーのガイドラインについて第2版公開草案を発表 みたいな記事だのツイートだのが出回って「パスワード定期変更が禁止」されたのが新しいことみたいな言説が出回っているので言っておくが、2017年6月発行のNIST SP800-63…

なんか、「定期的に変更するな」 NISTがパスワードポリシーのガイドラインについて第2版公開草案を発表 みたいな記事だのツイートだのが出回って「パスワード定期変更が禁止」されたのが新しいことみたいな言説が出回っているので言っておくが、2017年6月発行のNIST SP800-63 第3版ですでにそうなってるからね。新しいことではない。パスワードの文字の構成に対する規制も同様。

NIST SP 800-63B-4 2pd のガイドラインによると、パスワードの要件は以下のものを含みます:

長さ: パスワードは少なくとも8文字以上で、推奨される最低長は15文字です。パスワードの最大長は少なくとも64文字にする必要があります。 文字の受付: ASCII印刷文字、スペース文字、Unicode文字を受け入れるべきです。Unicodeのコードポイントはパスワードの長さを評価する際に1文字としてカウントします。 構成ルール: 特定の文字種を混ぜるなどの構成ルールを課すべきではありません。 禁止リスト: 一般的に使用される、予想される、または漏洩したパスワードとの比較を行う必要があります。比較はパスワード全体で行い、部分文字列では行いません。 保存: ソルトとハッシュを使った適切なパスワードハッシング方式で保存する必要があります。 変更と複雑さ: パスワード定期変更を要求してはいけません。しかし、危殆化が検出されたときには変更を要求しなければなりません。 利便性: 強力なパスワード選択のためのガイダンスを提供し、パスワードが拒否された場合には理由を通知すべきです7

これらのガイドラインは、セキュリティとユーザーの利便性のバランスを考慮しています。

また、中程度以上のリスクのシステムにおいては、パスワードのみというのはダメで、AAL2として以下をはじめとする要求事項があります。

認証装置の管理: 請求者が、加入者アカウントに紐づけられた1つ以上の認証装置を管理しているという高い信頼性が求められます。 2つの異なる認証要素の所有と管理の証明が、安全な認証プロトコルを通じて必要となります。

暗号化:承認済みの暗号化技術を使用しなければなりません。AAL2で使用される認証装置は、承認済みの暗号化認証装置でなければなりません。

リプレイ攻撃からの保護:AAL2で使用される認証装置は、少なくとも1つはリプレイ攻撃に耐えるものでなければなりません。
認証済みチャネル:請求者と検証者の間の通信は、1つ以上の認証済み保護チャネルを介して行われなければなりません。

生体認証の使用:生体認証要素が使用される場合、特定の性能要件を満たさなければなりません.

フィッシング耐性: 検証者は、少なくとも1つのフィッシング対策認証オプションを提供しなければなりません。連邦機関は、職員、請負業者、およびパートナーに対し、連邦情報システムへのアクセスにフィッシング対策認証を使用することを義務付けなければなりません。

新しいとして注目すべきはむしろこの「フィッシング耐性」。事実上、パスワードオンリーはもちろん、パスワード+OTPも禁じたと見るのが正しいでしょう。

Monday, 07. October 2024

FACILELOGIN

The lessons learned in building a secure, scalable, and reliable identity platform @ DevRev

Image credits: https://www.networkrailmediacentre.co.uk/resources/ap-0409 At DevRev, we are building an API-first dev-centric platform that gives you one unified view of your customers and products by connecting your data, teams, and users to our AgentOS platform and app experiences. This blog post shares some insights into how we built (and keep evolving) a secure, scalable, and reliable identity
Image credits: https://www.networkrailmediacentre.co.uk/resources/ap-0409

At DevRev, we are building an API-first dev-centric platform that gives you one unified view of your customers and products by connecting your data, teams, and users to our AgentOS platform and app experiences. This blog post shares some insights into how we built (and keep evolving) a secure, scalable, and reliable identity platform at DevRev. The decisions we made and the lessons we learned throughout this journey, we believe, would be equally applicable to most SaaS companies.

I would like to acknowledge and thank Akanksha Deswal, and the DevRev identity team, who were part of this journey, and instrumental in figuring out what works best for us.

An identity platform serves as a secure bridge between users and the services they access — whether apps, APIs, or microservices — delivering a seamless and trustworthy way to manage, verify, and protect user identities in increasingly interconnected environments. While the expectations for an identity platform can vary depending on specific use cases, at DevRev, we primarily focused on the following core areas.

The expectations for an identity platform can vary depending on specific use cases; and at DevRev, we primarily focused on the above core areas.

The decision to buy or build components of an identity platform depends on several factors, such as business requirements (both functional and non-functional), cost at scale, available resources, in-house expertise, and more. We wanted to maintain complete control over our object model (organizations, users, groups, etc.) while delegating more generic identity capabilities to an external identity service (identity provider). Having control over the foundational object model is essential for innovation, and allowing a third party to dictate its structure or limitations can stifle growth and introduce significant risks.

A few years ago, I enjoyed reading Ask Your Developer: How to Harness the Power of Software Developers and Win in the 21st Century by Jeff Lawson. In the book, Jeff reframes the classic ‘build vs. buy’ debate into a more urgent choice: ‘build vs. die.’ As every company evolves into a software company, the true competitive advantage lies in the software they build. When software becomes the critical interface between our services and our customers, not building it to meet the exact needs means risking irrelevance — or worse, failure.

Building doesn’t always mean starting from scratch. We shouldn’t reinvent the wheel if existing solutions meet our needs. Instead, focus on developing what’s core to our business — the components that give us a competitive edge. Everything else, the foundational components that enable us to build our vision, can be sourced from the digital supply chain. I touched on this concept in the blog post, The Next TCP/IP Moment in Identity.

We had no interest in building identity federation standards like OpenID Connect or SAML, nor did we want to develop authentication protocols in-house. These are commodity features supported by almost all identity providers. However, we wanted to ensure the external identity service handling federation and authentication remained a ‘thin’ layer, separate from our business logic. This approach minimizes vendor lock-in.

Vendor lock-in isn’t just about making it easy to switch providers — though that’s certainly part of it. More importantly, it’s about retaining the freedom to innovate independently, which was our primary focus. Within the DevRev platform, Janus serves as the identity service, managing the identity object model and interacting with the external identity provider during organization and user onboarding flows. For the rest of the DevRev platform components, the external identity provider remains a black box, only interacting with Janus.

Choosing the right identity provider requires a thorough and thoughtful evaluation. Our objective wasn’t to find an identity provider that perfectly matched the DevRev object model but rather to focus on its core constructs, their relationships, and the integration effort with the DevRev platform.

In the DevRev platform, each organization can choose its preferred authentication method. For example, one organization might opt for Google login, while another may use their identity provider via SAML. The external identity provider’s object model must support this flexible relationship between organizations and their connections. For instance, the connection to Google (social) is made using OpenID Connect, and ideally, we should be able to reuse that Google connection (or any other social connection) across multiple organizations.

This means the identity provider’s object model should allow connections to be defined independently of a specific organization and provide the ability to associate them with any org. During our evaluation, we encountered one identity provider that could only define a connection within the scope of an individual organization (or tenant), which led us to reject it.

Apart from key functional requirements for authentication and identity federation, we also worried about the scalability, availability, reliability, testability, compliance, and interoperability of the identity provider, and of course the cost at scale. We picked Auth0, mostly because of the lightweight organization model it had and it could meet our requirement to scale the number of organizations to 1 million. Most of the prominent identity providers do support almost all the key identity federation and authentication needs. Only in non-functional requirements and object relationships, you would find the differentiators.

In our hunt for an external identity provider, we mostly looked for a SaaS product. Going for a SaaS product gave us the freedom to focus on our core business problem, and delegate anything related to the external identity provider to the SaaS vendor. A key reason one would pick an on-prem product over a SaaS product is the freedom to do heavy customizations. Building heavy customization in the identity provider is something we wanted to avoid from day 1.

Then again, I am not completely ruling out identity provider on-prem deployments. It would depend on the industry you are in and your business-specific needs. For example, highly regulated industries would pick an on-prem identity provider to keep all their customer data within their boundaries. I led the development of the open-source WSO2 Identity Server for almost 14 years, before joining DevRev, and saw many large-scale successful on-prem deployments.

When picking a SaaS identity provider, non-functional requirements play a key role. We have little control over the non-functional requirements, and even from the SaaS vendor’s point of view, they will find it hard to accommodate change requests related to non-functional requirements; as it could require core changes in their product. For example, Auth0 offers an availability of four 9’s. When you create a tenant in Auth0, it’s bound to one region, and internally they create a replica of that tenant in another availability zone. However, when the primary Auth0 region is down, we do not have the control to switch to the corresponding availability zone. It’s the Auth0 team that decides when to switch based on the severity of the incident. Auth0 also offers a private cloud deployment, but, its a costly option.

Another limitation we found in Auth0 (and of course in most of the other prominent SaaS identity providers as well) replication of data between regions. For example, when you pin your tenant to the Auth0 US-1 region, all the API calls from any region globally must be routed to the US-1 region. Due to this limitation, we’ve witnessed increased latency levels in login API calls (OpenID Connect) for the users connecting from Asia. This hurts the app load time and at the moment we are in the process of building a solution in-house to minimize the impact.

The DevRev platform is built with 250+ microservices running on Kubernetes. Janus is one of them. Each service has its CI/CD pipeline, which deploys each commit to main, to Prod after running end-to-end tests in Dev and QA. We’ve integrated Auth0 into the Janus integration tests. We have a dedicated Auth0 tenant for that, and in the CI/CD pipeline, we connect to Auth0 via the API to create organizations and users; and then associate users with organizations. Once the tests are complete, we run a clean-up task to remove the resources created in Auth0, as it has certain resource restrictions.

There was a time when the clean-up task failed to run and unfortunately went unnoticed. This filled up Auth0 resources and soon reached its limits. Auth0 has a limit on the size of the app metadata that can be associated with a user. We used the app metadata to associate the DevRev-generated organization identifier with an Auth0 user. Once we hit the limit, the Auth0 system became unstable, and none of the data in the test tenant could be accessed either via the UI or the API. This became a blocker for the prod promotion of the Janus service and all other services that relied on Janus for testing. As a quick workaround (while the Auth0 team was working on recovering the tenant), we had to spin up a new Auth0 tenant and use that in our tests. Usually, these limits enforced by Auth0 are way beyond our production needs, and we encountered this issue due to the failed clean-up task from our end.

Auth0 has a rate limit on its management API. During our tests, we used this management API to create organizations and users. Due to the nature of the tests, we started to hit the rate limits (even with proper retries) which resulted in some flaky tests. As a solution, to this and the resource limit issue we discussed before, we ended up mocking Auth0 in our integration tests, rather than directly hitting the Auth0 test tenant.

We follow GitOps at DevRev to automate and manage infrastructure and application deployments. Git is the single source of truth. This helps us to take any change via a review process and audit all changes. We followed the same model with Auth0, where appropriate. For example, we use Auth0 actions to inject the DevRev-generated org/user identifiers into the tokens it produces. The actions are Javascript code and we keep it in a Git repo and deploy to the Auth0 Prod tenant via Dev and QA. This helped us avoid any manual edits from the Auth0 management console, and keep all three environments consistent.

Currently, not all operations on Auth0 are fully managed through GitOps. For example, we still update certain settings, such as token lifetimes for applications, directly through the Auth0 management console. Although these are infrequent tasks, it’s essential to have a complete audit trail in production for every operation and change made. While Auth0 provides logs, they are only retained for a month. To ensure long-term visibility, we push Auth0 logs to Datadog and set up Slack alerts for specific cases. Additionally, we’re exploring the use of the Auth0 Terraform provider to automate and manage all Auth0 operations via GitOps.

Passwords are painful, not just from the user’s point of view but also from the product side. If an app has to support password-based login, it must also build the supporting infrastructure to recover passwords, rotate them periodically, implement strong second-factor authentication, and so on. From day one, we made a design decision to avoid passwords.

We use Auth0 as an identity bridge. The contract between the DevRev app and Auth0 is defined by OpenID Connect. Auth0 helps us connect multiple other identity providers via SAML and OpenID Connect to authenticate users. These identity providers are either the social identity providers or the enterprise identity providers owned by DevRev customers. Following the identity bridge pattern helped avoid storing user passwords, and also introducing a new identity provider or an authentication method requires no changes to the DevRev app. It can be done transparently.

This graph shows the growth of unique monthly active Dev users by connection type over the last 12 months. The growth in active users is healthy, with enterprise connections starting to dominate.

The identity bridge pattern effectively addresses the common challenges posed by the spaghetti-identity and identity-silos antipatterns. Spaghetti identity occurs when a SaaS application tries to establish direct, point-to-point integrations with multiple identity providers, leading to complex, tangled connections. On the other hand, identity silos arise when an application is limited to a single federation or authentication protocol, making it difficult and costly to introduce new protocols in the future. By adopting the identity bridge pattern, organizations can simplify integration and enable flexibility in supporting multiple protocols without the downsides of these antipatterns.

It’s also worth mentioning how we leveraged the identity bridge pattern to overcome a limitation in Auth0. Auth0 supports passwordless authentication with email OTP; however, it cannot be used on its own with Auth0 organizations — only as a second factor. This limitation likely arises from Auth0’s architecture, as the organizations model was introduced later. To overcome this and use email OTP as a passwordless method, we spun up another Auth0 instance without organizations and then federated to it from the primary Auth0 tenant (which has organizations enabled). While this approach works, it is far from ideal. We anticipate switching to native passwordless email OTP within our primary Auth0 tenant as soon as that feature becomes available.

Auth0 provides the latency numbers it introduces during the login operations; however, that does not reflect the latency the users experience at the edge. To address this concern, and also to have better visibility into the OpenID Connect login flow, we started routing all login calls to Auth0 via our CDN provider, which is Fastly. On the Fastly Compute@Edge platform, we run an edge gateway, which pushes all the API stats to Google BigQuery for further analysis. There we would know the latency at the edge corresponding to all the login operations. These latency numbers are very much closer to what the users experience. The following graph shows the p90 latency numbers in milliseconds for the last 31 days against a set of selected regions, for the Auth0 token endpoint.

p90 latency numbers in milliseconds for the last 31 days against a set of selected regions, for the Auth0 token endpoint

All DevRev applications utilize the OpenID Connect authorization code flow. The token endpoint, highlighted in the graph above, is called once during the initial app load and periodically thereafter for token refreshes. While the periodic refreshes happen asynchronously and do not affect the user experience, the latency of the initial token request during page load is more critical.

Two main factors contribute to this initial latency:

Geographic latency — All users, regardless of location, connect to a centralized Auth0 instance in the US-East region. Web-worker-based login flow — The DevRev web app, a single-page application (SPA), leverages this flow, which adds complexity and impacts performance during authentication.

We opted for the web-worker-based login flow to enhance security by storing user tokens in browser memory more securely. However, web worker memory is scoped to a single browser tab, which means that even if a user has an active Auth0 session, opening a new tab requires calling both the authorize and token endpoints during the initial page load. Upon analyzing the stats, we found that 91.1% of the calls to the Auth0 authorize endpoint were from users who were already authenticated. To address this, we developed an edge-based solution targeting these 91.1% of users, aiming to reduce the latency at the edge to almost zero. As of this writing, this solution is being tested in our development and QA environments and will soon be available in production.

91.1% of the calls to the Auth0 authorize endpoint were from users who were already authenticated (page reloads). The users once logged in to the app, keep the login session. This graph shows the average number of active vs silent logins by weekday for the last 90 days. The edge-based solution we are working on will reduce the latency at the edge for the login calls for already authenticated users by almost 100%.

In addition to publishing stats related to Auth0 API performance from the edge to Google BigQuery, we also send logs to Datadog. This allows us to trigger alerts in case of Auth0 failures and detect downtimes early. Moreover, this helps us identify which users and organizations are impacted during Auth0 outages — events that have occurred a few times in the past couple of years.

The contract between all the DevRev apps and the identity provider is based on OpenID Connect. However, we didn’t find the ID token to be particularly useful in the case of an SPA. An SPA is simply a “dumb” app or a public client, which means it cannot make any trustworthy decisions on its own. The access token is the key! (If you’re interested in learning more about the difference between the ID token and the access token, please check out this video.)

We use self-contained access tokens (JWTs), which include the user context with every request. These tokens have been widely adopted in the industry and were formalized as an IETF standard in 2021 (RFC 9068). While Auth0 has yet to support RFC 9068, this has had no impact on our use cases. In fact, for an SPA, using reference tokens (the other alternative to self-contained access tokens — also known as opaque access tokens) is not a viable option. Only the issuer of an opaque token can validate it, and it’s impractical for the gateway intercepting API requests to communicate with the issuer for token validation, for each request.

When using self-contained tokens, we also need to implement support for revocation. Unlike self-contained tokens, when a reference token is revoked, that request goes to the token issuer. The consumer of the token (such as a gateway) would be aware of the revocation because it communicates with the token issuer each time it validates a token. However, there’s no point in communicating with the issuer to revoke a self-contained token unless there’s a reliable way to propagate the revocation decision to all dependent systems. To address this, we introduced the auth-tokens.delete endpoint, allowing us to revoke access tokens upon user logout. We also maintain a cache of the revoked token IDs, with a TTL based on the token’s expiration.

Returning to the topic of the ID token, the DevRev app relies on the dev-users.self API to identify the user. Additionally, another reason we can't rely on the ID token for user identification is that Auth0 cannot embed the necessary user information into the ID token, as we store only a minimal set of user data in Auth0.

Most of the support queries we receive from our customers related to login are due to some kind of an issue with the OTP delivery. Even though the number of such queries is quite low (compared to the success cases), when it happens, it’s painful to the users and hurts the business. Most of the DevRev customers at the stage of evaluation use email OTP-based login and then later switch to login with their identity provider, with SAML or OpenID Connect. Apart from our direct customers, we also offer a support portal for our customers’ customers (who we call Rev users), which also offers login with email OTP.

Most email OTP-related issues arise from one of the following causes:

The user’s email is added to the email provider’s global suppression list. The OTP email is marked as spam. The OTP email is rejected by the user’s mailbox due to internal domain restrictions, or other similar factors.

Email service providers (ESPs) maintain global suppression lists to prevent sending emails to addresses that have previously bounced (due to invalid addresses) or opted out of receiving emails from certain senders. If a user’s email address is on such a list, the OTP email won’t be delivered. This can happen if the user’s email previously encountered delivery issues, such as soft or hard bounces.

Initially, we used SendGrid as our ESP for sending OTP emails as well as marketing emails. Although we used two separate SendGrid accounts for these purposes, SendGrid maintains a global suppression list shared across accounts when emails originate from the same domain — devrev.ai in our case. This meant that if a user unsubscribed from marketing emails, they were automatically added to the global suppression list, preventing them from receiving OTPs. In some cases, users didn’t even unsubscribe themselves; certain plugins handled it automatically.

This behavior was undesirable, as it significantly impacted OTP delivery. To address the issue, we switched to AWS SES for sending OTP emails, which reduced global suppression list-related problems by nearly 90%.

We once encountered a particularly interesting issue related to email OTPs, and while it wasn’t a delivery problem, it’s worth mentioning. The user received the OTP email, but every time they tried entering the OTP, it failed. What made this case even more intriguing was that we discovered the OTP had already been attempted from a different IP address just before the user tried it. This explained why the OTP kept failing — the user was attempting to use an OTP that had already been used.

After further troubleshooting, we discovered the root cause. Along with the OTP in the email, we had also embedded an OTP link that allowed users to log in directly by clicking it. Although this link was commented out in the HTML template and not visible to the user, the organization this user belonged to had strict security protocols that automatically scanned and clicked on links embedded in incoming emails. This security bot didn’t distinguish between commented-out links and visible ones, so it clicked the hidden link, sending a request to the Auth0 server and invalidating the OTP before the user could use it. The issue was resolved by completely removing the commented-out OTP link from the email template.

Many email providers have spam filters that automatically classify certain emails as spam based on various criteria, such as the content of the email, sending frequency, or the sender’s reputation. If the OTP email is flagged as spam, it ends up in the user’s spam or junk folder, leading to the user not receiving it in their primary inbox. This could occur if the email provider deems the sender domain as suspicious or if the email contains certain keywords that trigger spam filters. After moving from SendGrid to AWS SES and then carefully designing the email template, we could get the number of spam emails almost down to 0.

Every organization and user in the DevRev platform is assigned a unique, immutable, and persistent pseudonym known as a DON (DevRev Object Name). Every resource in the platform has its own DON. The format of a DON is: don:<service>:<partition>:(<type>/<id>)+, inspired by Amazon Resource Names (ARN).

For example, don:identity:dvrv-us-1:devo/xyz is the DON of an organization. Here, identity refers to the service name, and dvrv-us-1 is the partition identifier, which indicates the organization’s location. For example, an organization in the us-east-1 region has dvrv-us-1 as its partition ID, while one in ap-south-1 has dvrv-in-1.

The type field of the DON defines the type of the object it is attached to. Every key entity within the DevRev platform is associated with an object, and all objects share a common ancestor known as the Atom object. For example, a devo object type represents a Dev organization, while devu represents a Dev user. A Dev organization is a company or team that uses the DevRev platform, and its members are referred to as Dev users. Customers of a Dev organization are called Rev users, who belong to Rev organizations (also known as workspaces). Both Rev organizations and Rev users operate within the scope of a Dev organization. For instance, the DON of a Rev organization would be: don:identity:dvrv-us-1:devo/xyz:revo/pqr.

All identity objects in the DevRev platform follow a predefined schema, ensuring consistency and standardization across all organizations. This common schema facilitates smooth integration and uniformity across the platform. However, recognizing that different organizations may have unique requirements, we also provide the flexibility to extend the Dev user and Rev user objects with custom fields specific to a given Dev organization.

Customizability is essential for organizations that need to tailor their identity models to accommodate unique workflows, industry requirements, or specific customer data needs. For instance, a Dev organization might need to capture additional fields such as department codes, job roles, or region-specific preferences that go beyond the standard schema. Moreover, when importing Rev user objects from external systems like Salesforce or Zendesk, these objects may come with their own set of attributes. The ability to extend and adapt the schema allows organizations to seamlessly integrate and migrate from these external systems to the DevRev platform.

All personally identifiable information (PII) of DevRev users is stored exclusively in Janus, making it the single source of truth for PII. Other services in the platform reference users through their DONs. For example, when a client calls the works.get API to retrieve a work item, the request goes through the DevRev gateway, which makes a gRPC call to the corresponding service. The service returns the work item data, including the DONs of the users who created and are assigned to the item.

The DevRev platform is built with 250+ microservices running on Kubernetes. Fastly Compute@Edge serves as the entry point at the edge for all DevRev services. At the origin, the DevRev API gateway intercepts all incoming traffic. Once the gateway verifies the JWT accompanying an API request, it dispatches the request to the appropriate service.

Since raw user DONs are not user-friendly, the gateway resolves them into user details, such as display name, email, and full name, by querying Janus (the API composition pattern). These user summaries are cached in Redis to avoid repetitive calls to Janus (from the gateway), ensuring faster response times for future requests.

A DON attached to a user or an organization is an immutable identifier. It’s not just immutable, but also resolvable by its own — or embeds information in it. For example, we use the partition ID embedded into the organization DON to route requests to the corresponding region of the organization, in our multi-regional routing implementation built at the edge. Designing a system based on immutable identifiers needs to be done with care because the cost of a bad design decision is high. For example, we embed the Dev organization ID into the user DON, which means we cannot move this user to another Dev organization or a user cannot be part of multiple Dev organizations. This is a conscious decision we made to scope all the resources in the DevRev platform under a Dev organization.

However, we made a mistake in our Rev user implementation. The Rev user DON embeds both the Dev organization ID and the corresponding Rev organization ID. This means, a Rev user cannot switch Rev organizations, or cannot be part of multiple Rev organizations. This was not a requirement initially, but as we kept learning from our customers, we figured that it was a too restrictive constraint to limit a Rev user to one Rev organization. We made the Rev user scoped at the Dev organization level and removed the Rev organization ID from the DON. This was a painful change.

The Dev organization ID (don:identity:dvrv-us-1:devo/xyz), which is embedded into a DON is not incremental. An incremental ID can expose sensitive information, such as the total number of organizations created over time. Malicious actors could exploit this to infer patterns about system growth or attempt ID enumeration attacks to access or manipulate unauthorized resources. Non-incremental IDs make it harder to guess or predict other organization IDs, improving security and reducing the risk of ID enumeration.

The randomly generated IDs provide global uniqueness, meaning IDs can be safely generated across distributed systems without the risk of collision. This allows for a more flexible, decentralized architecture where different systems or regions can generate IDs without coordinating with a central system.

We initially used a 48-bit random value as the Dev organization ID. Once base62 encoded, it becomes a string with either 8 or 9 characters. When we moved to the multi-regional deployment, we wanted to ensure the Dev organization ID is unique across all regions by adding a regional prefix, for example, “1” for us-east1. However, since the previously generated Dev organization IDs were already 8 or 9 characters long, this introduced a challenge. A new region could potentially generate an ID with the prefix that results in 9 characters, which could match an already generated ID from a different region, before the introduction of the regional prefix. To address this, we first ensured the base62-encoded 48-bit value was always 9 characters by adding padding, and then introduced a regional prefix on top of that, effectively making the Dev organization ID a fixed-length random string of 10 characters.

What does deleting a Dev organization mean? A Dev organization has many other resources linked to it within and outside the identity space (or Janus). Dev users, Rev users, workspaces, accounts, service accounts, system users, and preferences are all identity-related objects, while conversations, articles, tickets, and issues, for example, are objects outside the identity space.

This is a common problem in a distributed system. What we need is a cascade delete. Implementing a cascade delete in a distributed system requires careful consideration of consistency, availability, fault tolerance, and scalability. Unlike a traditional monolithic system, where database constraints and relationships can directly enforce cascade deletes, distributed systems involve multiple services, databases, and possibly different regions, making the process more complex.

When a Dev organization is deleted, we mark it as soft-deleted. A soft delete marks a Dev organization as deleted without actually removing it from the database. This is, in general, useful to avoid cascading failures in case other services depend on the deleted entity. When a Dev organization is marked as deleted, it triggers a CDC (change data capture) event, and we have a dedicated service (garbage collection service), listening to these events and taking actions based on the type of the object being deleted and the dependent object types.

Once the garbage collection service picks the event to soft delete an organization, it spins up a temporal workflow, which will trigger the cascade delete of the dependent objects after a retention period. As you have figured out already this is a recursive process, and we only need to specify the first level of dependencies.

In general, there are two common saga patterns for cascade delete implementation, orchestration-based saga and choreography-based saga. In the orchestration-based saga pattern, a central coordinator service orchestrates the cascade delete by calling each service that manages dependent objects in a defined sequence. It ensures the cascade completes across all services or triggers compensating actions (e.g., undo the deletion) if a step fails. In the choreography-based saga pattern, each service listens for events and performs its part of the cascade delete, triggering further events to continue the process. We picked the orchestration-based saga pattern over the choreography-based saga.

At DevRev, we leverage a range of databases — including MongoDB, PostgreSQL, and DynamoDB — each selected to address specific use cases. For all identity-related data managed by Janus and the STS (which we’ll discuss later in this blog), we rely on MongoDB. MongoDB is ideal for transactional (OLTP) workloads, excelling in scenarios that demand high write throughput and the ability to manage large volumes of concurrent transactions. Its built-in replication via replica sets provides automatic failover and data redundancy across multiple nodes, enhancing both reliability and fault tolerance. To ensure data consistency, we utilize the majority write concern and read from the primary node. Currently, we are focusing on improving read and write performance in this setup.

MongoDB also offers horizontal scaling through sharding, where each shard operates as a replica set. Data is distributed across shards using a shard key, which is a field (or combination of fields) that determines how MongoDB partitions and routes data. Although we currently operate with a replica set without sharding, we’ve designed each collection in the identity database with a Dev organization ID field that can serve as the shard key, allowing us to seamlessly enable sharding when needed.

Unlike traditional relational databases, which require predefined schema structures, MongoDB’s document-oriented nature supports schema-less design. This flexibility has allowed us to rapidly develop and iterate on identity objects without the need to define a schema or handle migrations upfront.

Along with MongoDB we also use Google BigQuery, which is a fully managed, serverless, and highly scalable data warehouse designed for fast SQL-based queries and real-time analytics on large datasets. We are using the MongoDB Kafka connector to capture change data capture (CDC) events from MongoDB and push them to BigQuery. This also acts as an audit for all changes at the data level.

SCIM support had always been planned but was deprioritized due to other tasks. We’re excited that it is now an integral part of the DevRev identity platform. SCIM offers a standardized protocol that streamlines the exchange of identity information between identity providers and external applications, significantly reducing manual processes and minimizing errors. It ensures that user data is synchronized and consistent across all systems. Furthermore, SCIM enhances security by automating the revocation of user access when employees leave or change roles, reducing the risk of unauthorized access and ensuring proper access management.

DevRev provides a SCIM endpoint for provisioning users and groups from an organization’s identity provider. Initial testing was done using Okta as the identity provider, and the DevRev SCIM integration will soon be available on the Okta Integration Network. Following that, we plan to expand availability to Azure and Google platforms. As a DevRev platform customer, the DevRev organization has already implemented SCIM for integration with Okta, streamlining identity management within the organization.

The identity bridge pattern (which we discussed before) facilitates trust mediation between the identity provider trusted by the DevRev platform (Auth0) and external identity providers that authenticate users from DevRev organizations. This pattern is primarily focused on human identities. To bridge the human identity into the services, we’ve introduced a security token service (STS). The STS implementation is inspired by the RFC 8693. This adds another level of indirection in trust, and all DevRev services only need to trust the STS. You can read more about STS from this blog: Securing north/south and east/west traffic @ DevRev.

STS is a key part of the DevRev identity architecture, which helps to make the identity provider a thin layer, which we discussed earlier in this blog. An Auth0 token tells us who the user is and does not embed any PII data other than the email address. However, when exchanging an Auth0 token for an STS-issued token, we have the flexibility to enrich the token with the data available in the DevRev platform.

The STS is capable of issuing tokens of different types. This graph shows the number of tokens issued by the STS for the last 30 days.

The STS is capable of issuing tokens of different types. For example Personal Access Tokens (PAT) for Dev users, Application Access Tokens (AAT) for service accounts, Rev session tokens for Rev users, and so on. Each of these token types is bound to a set of processing rules. For example, while issuing a Rev session token, the STS talks to Janus, and just-in-time provisions the Rev user, if they do not exist in the system. Another example would be how we handle delegation workflows. When a snap-in wants to act on behalf of a Rev user, it authenticates with its AAT (which represents the snap-in service account) and requests a Rev session token. Also, the STS-issued tokens are used in the multi-regional setup to facilitate certain region-to-region interactions.

In this blog post, we highlighted key aspects of the DevRev identity platform and the lessons we’ve learned throughout its development. This is an ongoing effort, and as we continue to gain insights from our customers, we constantly discover new ways to enhance the platform and tackle exciting challenges in the identity and security landscape.

The lessons learned in building a secure, scalable, and reliable identity platform @ DevRev was originally published in FACILELOGIN on Medium, where people are continuing the conversation by highlighting and responding to this story.


Kerri Lemole

Explaining Verifiable Credentials and Open Badges 3.0

Part 1: The Trust Model of Open Badges When the Open Badges v 0.5 specification was first launched by the Mozilla Foundation in 2012, the intention was to recognize learning that happens at any time, anywhere, and any how. The trust of the badges was based on 1) the persistent hosting of the credential data and 2) the content of the credential in the context it was being evaluated (including
Part 1: The Trust Model of Open Badges

When the Open Badges v 0.5 specification was first launched by the Mozilla Foundation in 2012, the intention was to recognize learning that happens at any time, anywhere, and any how. The trust of the badges was based on 1) the persistent hosting of the credential data and 2) the content of the credential in the context it was being evaluated (including evidence which was considered much more significantly in early open badges). While digital signatures were included in subsequent versions, the trust shifted to the badging platforms who would presumably check the identity of the issuers before they allowed them to use their platform.

The challenges with this have been:

1) assuming that the badges will always be available for the earner. Trust is based on the presumption that badging platforms persist and will continue a relationship with the issuer, ensuring that even if an earner’s email address has changed that they will still have access to their badges.

2) ensuring that once a badge has been awarded to an earner that its contents do not change. There is no assured permanence for web platforms. Databases can be altered purposely or accidentally. They can be hacked and some badging platforms allow their clients to change badge descriptions after they’ve been awarded.

With the most recent update of Open Badges, version 3.0, a new trust model has been adopted that makes it possible for earners to

have access to their digital credentials for as long as they think they will need them, decide which online identity they would like associated with their credentials, know that their badges will be perceived as being trustworthy because they have been digitally signed by an identified issuer and can prove that their badge data has not changed since they claimed it.

1EdTech Open Badges 3.0 (OBv3) is based on a standard developed and maintained by the World Wide Web Consortium (W3C) called Verifiable Credentials (VCs). The W3C was founded by the creator of the web, Tim Berners Lee, and is responsible for providing the standards that make the web work. VCs can be used to prove any claim online. OBv3 builds upon the VC standard with the purpose of proving claims made about educational achievements. These credentials have historically included badges and micro-credentials but can now also be used for degrees, certifications, and licenses. This means that earners could prove all of their achievements the same way across the web with convenience and security.

This blog post is Part 1 of a series of content explaining W3C Verifiable Credentials and Open Badges 3.0. Stay tuned for Part 2 — “How do Verifiable Credentials and Open Badges 3.0 Work?”

For more information on the DCC, including membership, projects, and events, visit our website, wiki, and Linkedin. To receive our newsletter and invitations to webinars and events sign up for our community mailing list.

Explaining Verifiable Credentials and Open Badges 3.0 was originally published in Digital Credentials Consortium on Medium, where people are continuing the conversation by highlighting and responding to this story.


Damien Bod

Microsoft Entra ID App-to-App security architecture

This article looks at the different setups when using App-to-App security with Microsoft Entra ID (OAuth client credentials). Microsoft Entra App registrations are used to configure the OAuth clients and resources. For each tenant, an Enterprise application is created for the client App registration when the consent is granted. The claims in the access token … … Continue reading →

This article looks at the different setups when using App-to-App security with Microsoft Entra ID (OAuth client credentials). Microsoft Entra App registrations are used to configure the OAuth clients and resources. For each tenant, an Enterprise application is created for the client App registration when the consent is granted. The claims in the access token are validated as much as possible to prevent unauthorized clients from using the resource.

Code: https://github.com/damienbod/GrpcAzureAppServiceAppAuth

Use a single ME-ID App registration for client and resource

A single Microsoft Entra ID App registration can be used for both the client and the resource specification. The aud and the azp claims in the access token will have the same value in this setup. The roles claims are of no benefit. This could be considered a bad architecture because the ME-ID App registration is used for two separate intentions, the client requesting the access token and the resource providing the service. Deployment is simplified as only one App registration is required. The setup cannot be used for multiple clients.

The access token created for this setup will have the same value for the aud claim and the azp claim as the client and the resource are the same. Both of the claims should be validated to prevent other clients accessing the resource.

{ "aud": "19893e32-3f4d-4c5a-b5ca-27891cf75666", "iss": "https://login.microsoftonline.com/7ff95b15-dc21-4ba6-bc92-824856578fc1/v2.0", "iat": 1727688828, "nbf": 1727688828, "exp": 1727692728, "aio": "k2BgYDD6x3jLSW9uSapm1it3vtdnC66c40v83qzsVLhQga//6EsA", "azp": "19893e32-3f4d-4c5a-b5ca-27891cf75666", "azpacr": "1", "oid": "a517247c-86ba-43d9-8e02-7cf7918c15f4", "rh": "0.AR8AFVv5fyHcpku8koJIVlePwTI-iRlNP1pMtconiRz3VmaFAAA.", "roles": [ "access_as_app" ], "sub": "a517247c-86ba-43d9-8e02-7cf7918c15f4", "tid": "7ff95b15-dc21-4ba6-bc92-824856578fc1", "uti": "DBleEOFcqk-rcld809IQAA", "ver": "2.0" }

In the Azure portal, the Microsoft Entra ID App registration is configured to add an application App Role. This role is returned in the access token as shown above.. It is not required in this setup.

The access token can be validated using Microsoft.Identity.Web in a Web API.

builder.Services.AddAuthentication(JwtBearerDefaults.AuthenticationScheme) .AddMicrosoftIdentityWebApi(builder.Configuration.GetSection("AzureAd")); builder.Services.AddAuthorization(options => { options.AddPolicy("ValidateAccessTokenPolicy", validateAccessTokenPolicy => { // Validate id of application for which the token was created // In this case the CC client application validateAccessTokenPolicy.RequireClaim("azp", "19893e32-3f4d-4c5a-b5ca-27891cf75666"); // only allow tokens which used "Private key JWT Client authentication" // // https://docs.microsoft.com/en-us/azure/active-directory/develop/access-tokens // Indicates how the client was authenticated. For a public client, the value is "0". // If client ID and client secret are used, the value is "1". // If a client certificate was used for authentication, the value is "2". validateAccessTokenPolicy.RequireClaim("azpacr", "1"); }); });

Use separate ME-ID App registrations for client and resource

This architecture allows for separation of clients and is required if more than one client can use the resource. The different clients can be identified in the application using the azp claim or the oid claim. The aud claim has the resource value and must be validated. The separate clients allows you to distribute different client certificates or secrets for each client. Audit logs can also be per client. The application can also implement separate business logic or authorization based on the client_id, object ID. (azp or oid)

The aud claim returns the resource App registration client_id and the azp returns the client App registration client_id. As two App registrations are used, the claims have different values. The roles claim is also returned in the access token. The roles permission is required to setup the client and resource relationship in the portal, but not required for authorization, if the other claims are validated correctly. It must be validated that only the allowed client acquired the access token for the resource and not any application access token from the same tenant.

{ "aud": "1a03257f-18a2-4cfa-81c1-d3cfaba6b09e", "iss": "https://login.microsoftonline.com/7ff95b15-dc21-4ba6-bc92-824856578fc1/v2.0", "iat": 1727685006, "nbf": 1727685006, "exp": 1727688906, "aio": "k2BgYFCeXa1fbNX34Odf08MOyzuu8k5eIO5xfYJi0rL41gkxx9QB", "azp": "912584f0-442c-41e3-87b3-3232edd82940", "azpacr": "1", "oid": "eb11b352-b9cc-489b-9e50-5eae9ee92e09", "rh": "0.AR8AFVv5fyHcpku8koJIVlePwX8lAxqiGPpMgcHTz6umsJ6FAAA.", "roles": [ "SharedAppRoleForAppToAppClients" ], "sub": "eb11b352-b9cc-489b-9e50-5eae9ee92e09", "tid": "7ff95b15-dc21-4ba6-bc92-824856578fc1", "uti": "Poa-FXigHkWg2mrtySMPAA", "ver": "2.0" }

Claim values returned in the access token:

aud

This MUST be validated and has the client_id from the resource ME-ID App registration.

roles: SharedAppRoleForAppToAppClients

This can be used to validate the access token, if multiple clients can be used to access the resource. This is not required, if the aud is validated. When using OAuth client credentials and an App-to-App client (no delegated user), the default scope is used to access the resource and all roles from the App registration are included in the access token.

azp

The azp can be used to lock down the resource to intended clients. This is different for each client. If you validate this, only the allowed clients can use the resource.

tid

The tid claim can be used to validate the tenant which requested the resource. It is important to validate this when using multi-tenant App registrations, if the resource is not for “All” Microsoft tenants. Only allow the tenants intended for the resource.

oid

The oid claim is the Microsoft Entra ID Enterprise application object ID created for the client App registration. If you are using only a single client and validate the OID, then you can prevent other Enterprise applications using the resource.

Application implementation in ASP.NET Core

The resource can be implemented using Microsoft.Identity.Web, if it is an ASP.NET Core Web API application.

// Add services to the container. builder.Services.AddAuthentication(JwtBearerDefaults.AuthenticationScheme) .AddMicrosoftIdentityWebApi(builder.Configuration.GetSection("AzureAd")); builder.Services.AddAuthorization(options => { options.AddPolicy("ValidateAccessTokenPolicy", validateAccessTokenPolicy => { // Validate id of application for which the token was created // In this case the CC client application // Works with multi-tenant App registrations validateAccessTokenPolicy.RequireClaim("azp", builder.Configuration["AzureAd:ClientId"]!); // Value of Azure App registration where role is defined (resource) validateAccessTokenPolicy.RequireClaim("aud", builder.Configuration["AzureAd:Audience"]!); // Single tenant Enterprise application object ID // Only validate if locking down to a single Enterprise application. validateAccessTokenPolicy.RequireClaim("oid", builder.Configuration["AzureAd:Oid"]!); // only allow tokens which used "Private key JWT Client authentication" // https://docs.microsoft.com/en-us/azure/active-directory/develop/access-tokens // Indicates how the client was authenticated. For a public client, the value is "0". // If client ID and client secret are used, the value is "1". // If a client certificate was used for authentication, the value is "2". validateAccessTokenPolicy.RequireClaim("azpacr", "1"); }); });

The application configuration would look like this with your tenant ids.

"AzureAd": { "Instance": "https://login.microsoftonline.com/", "Domain": "damienbodhotmail.onmicrosoft.com", "TenantId": "7ff95b15-dc21-4ba6-bc92-824856578fc1", "ClientId": "912584f0-442c-41e3-87b3-3232edd82940", // aud claim in the access token, Azure App registration client ID "Audience": "1a03257f-18a2-4cfa-81c1-d3cfaba6b09e", "Oid": "eb11b352-b9cc-489b-9e50-5eae9ee92e09" },

Is the App Role required?

In no setup, is the App role required in the application implementation as the aud claim can be used to validate the resource. As the default scope is used, all roles in the resource App registration are included in the access token.

The role is required to add a permission in the client ME-ID App registration for a separate App registration resource specification. The role is required in the Microsoft Entra ID portal, if you have different clients App registrations and resources App registrations.

Multi client, Multiple resources

In more complex architectures, the solution might have multiple services. A single client can used multiple resources.

The Azure App roles are required to assign the resources to the clients. The application requesting the access token for the resource must specify the default scope of the Microsoft Entra App registration resource. This means that in Microsoft Entra ID, a separate access token is used for each resource. A client can request an access token for any resource where the role has been assigned. In the resource implementation, i.e. the API access token validation, the App roles can be used to authorize the request. The aud claim can also be used and the roles are not required.

Notes

If you are using multiple clients, separate the client and the resource specifications. If you only have a single client for an API, then you can just deploy the single Azure App registration for both client and resource. If you are a purist and require that the App registrations match the implementations and are architectural correct, use only one App registration for each application, one the client and one for the resource. If requirements are changed, you can always split this later without complication. Less is more.

Links

https://github.com/AzureAD/microsoft-identity-web/wiki

https://learn.microsoft.com/en-us/entra/identity-platform/

Saturday, 05. October 2024

Werdmüller on Medium

The two Fediverses

For some, it’s about growth. For others, it’s a movement. They must work together. Continue reading on Medium »

For some, it’s about growth. For others, it’s a movement. They must work together.

Continue reading on Medium »


Jon Udell

Geothermal power in the North Bay

I was aware of The Geysers, a geothermal field about 35 miles north of my home in Santa Rosa, but I never gave it much thought until my first bike ride through the area. Then I learned a number of interesting things. It’s the world’s largest geothermal field, producing more than 700 megawatts. It accounts … Continue reading Geothermal power in the North Bay

I was aware of The Geysers, a geothermal field about 35 miles north of my home in Santa Rosa, but I never gave it much thought until my first bike ride through the area. Then I learned a number of interesting things.

It’s the world’s largest geothermal field, producing more than 700 megawatts.

It accounts for 20% of California’s renewable energy.

The naturally-occurring steam was used up almost 30 years ago, and steam is now recharged by pumping in 11 million gallons of sewage effluent daily, through a 42-mile pipeline, from the Santa Rosa plain.

That daily recharge is implicated in the region’s frequent small earthquakes. (But nobody seems too worried about that, and maybe it’s a good thing? Many small better than one big?)

An article in today’s paper reports that AB-1359, signed last week by governor Gavin Newsom, paves the way for new geothermal development in the region that could add 600 megawatts of geothermal production.

How much electric power is that? I like to use WolframAlpha for quick and rough comparisons.

So, 2/3 of a nuke plant. 4/5 of a coal-fired power plant. These kinds of comparisons help me contextualize so many quantitative aspects of our lives. They’re the primary reason I visit WolframAlpha. I wish journalists would use it for that purpose.

Thursday, 03. October 2024

The Pragmatic Engineer

The Pulse #109: Open source business model struggles at Wordpress

Also: OpenAI’s biggest-ever fundraise even as key people keep quitting; why executive recruiters ignore tech professionals, and more

Today, we cover:

Industry pulse. Microsoft won’t follow Amazon in 5-day RTO, Cloudflare auto-mitigates world-record DDoS attack, California nearly regulates AI companies, Revolut starts selling performance management framework as a product, and more.

Open source business model struggles: Wordpress. Automattic, creator of Wordpress, is being sued by one of the largest WordPress hosting providers. The conflict fits into a trend of billion-dollar companies struggling to effectively monetize open source, and are changing tactics to limit their competition and increase their revenue.

OpenAI: biggest-ever fundraise, as key people keep quitting. OpenAI raised a $6.6B in funding on a $157B valuation, making it the largest fundraiser ever. And yet, cofounders and executives continue to depart. This time it’s the CTO, Chief Research Officer, the VP of Research. Also, OpenAI will most likely finally become a for-profit.

Why executive recruiters ignore tech professionals. Connecting with executive recruiters is a helpful strategy for engineering leaders. Senior product director shares key reasons these outreaches fail, and what you can do to improve chances.

1. Industry pulse Microsoft won’t follow Amazon in 5-day RTO

In Seattle, the two largest tech employers are Microsoft and Amazon. Both companies are willing and able to match compensation of each other, so when Amazon announced two weeks ago that it’s mandating a strict 5-days-from-office policy from 1 Jan 2025, the question was whether Microsoft would follow.

The company has internally confirmed it will not, as per Business Insider, Microsoft EVP Scott Guthrie told employees that Microsoft has no plans to tighten remote work policies. This means Microsoft workers can expect to spend 2-3 days per week in the office as a hybrid arrangement. I expect experienced Amazon software engineers who are unhappy about their workplace’s 5-days-per-week policy, and who live within commuting distance of a Microsoft office, to start applying for open positions at Microsoft.

Cloudflare auto-mitigates world-record DDoS attack

Read more

Tuesday, 01. October 2024

The Pragmatic Engineer

Bug management that works (Part 1)

Finding and triaging bugs, fixing bugs on the spot instead of ‘managing’ them, and how to make time for bug fixing

Hi, this is Gergely with a subscriber-only issue of the Pragmatic Engineer Newsletter. In every issue, I cover challenges at Big Tech and startups through the lens of engineering managers and senior engineers. To get articles like this in your inbox, every week, subscribe:

Subscribe now

Before we start: The Pragmatic Engineer Podcast started off last week, with episode #1: AI tools for software engineers, but without the hype. New episodes come every other Wednesday. Thank you to everyone who listened to this first one. If you enjoy podcasts, please do add it on Apple, Spotify, YouTube or your in your favorite player – and you will have episodes automatically show up, while also helping the show.

How do you deal with bugs in software products you build? This topic seems very under-discussed, but it affects pretty much every software team. To find out what professionals think, with we reached out to two dozen engineering leaders and software engineers, who kindly shared approaches which work for their team and organization.

This topic is evergreen, and one that has outsized (potentially decisive!) impact on product quality and user experience. 

In this issue, we cover:

Finding bugs. Dogfood products where possible, invest in test automation, stay close to users, and more.

Reporting bugs. Making it easy for users to report bugs, having an accessible engineering team, and scaling the reporting process by team or company size.

Bug triaging. Figure out which bugs are critical by using bug priority rubrics, consider priority definitions, and get inspired by Spotify’s bug prioritization framework.

Don’t manage bugs: fix them! See bug fixing as like gardening, not maintenance. Consider approaches like deleting all amassed bugs, and regularly pruning the backlog.

Zero bugs policy. An approach where all inbound bugs are either fixed on the spot: or the bug is deemed invalid. A subset of this approach is to fix all bugs for recently released features – which is what Figma does.

As a refresher, we have a few previous deepdives that are related to this topic:

Shipping to production

Incident review and postmortem best practices

Healthy oncall practices

How Big Tech does Quality Assurance (QA)

QA approaches across the industry

Thank you to everyone who contributed insights to this article: Ahmed Saher (engineering manager), Anaïs van Asselt (senior QA engineer), Andrea Sipos (product leader), Bernd Kampl (Software Engineering Team Lead), Jason Diller (VP of Engineering), John Cutler (product leader), Magnus L. Udbjørg (CTO), Michał Borek (Principal Engineer), Rebecca Frost (QA leader), Rebecca Holm Ring (engineering leader), Ruben Weijers (engineering manager), Ryan Hanni (Director of Engineering), Serdar Biyik (engineering manager), Walter de Bruijn (Head of Engineering Productivity)

1. Finding bugs

How can we be confident that the software we release has no known issues? We need to validate that it works correctly and there are common approaches for this.

Dogfood products. The term “dogfooding” is the name of the common practice of devs and employees using a product while they are building it, pre-release. For example, when I worked at Uber, the company issued free credits for staff to use the internal beta app for rides and food deliveries. At Skype, we used running internal beta versions of Skype for all internal chat and video calling. The business gave Skype credits to employees, so we could dogfood paid features like landline calls. Spotify does the same, as Rebecca Holm Ring, a former engineering manager there, shares:

“Each employee at Spotify is expected to be on the master or internal Spotify release, and report any issues they experience. A problem here though is that most Spotify employees will be iOS users, and so the Android app doesn’t get nearly enough testing before rolling out.”

Invest in test automation. Anaïs van Asselt – senior QA at Choco – shares their approach:

“Proactively finding and fixing bugs before they reach production is crucial. We invest in various test automation techniques to act as quality gates. Additionally, we practice bug hunting and gradually roll out new features to a limited user base, allowing us to catch and fix bugs early, reducing their overall cost.”

At smaller companies, be close to users. These places tend to be closer to users and can use this to build a relationship with users who get invested in the product and the reporting of bugs. Bernd Kampl – Software Engineering Team Lead at Anyline, a smaller cartech AI company – shares:

“As we are still smaller, we usually find bugs when our customers report them. If that's not the case, we identify them mostly during day-to-day development work.”

Magnus Udbjørg is CTO of Testaviva, a 50-person startup in Denmark. His take is that it’s optimal to build trust with users so they report issues:

“To find, identify, and prevent bugs, we foster a culture of dialogue and mutual understanding. We believe our developers need to understand our users deeply, and sometimes, our users need to understand the technical constraints we face. Open communication is key to building trust.”

A fair question is why not do lots of testing, themselves? The smaller the company and the fewer the customers, the more it feels too expensive to invest a lot in testing, early on. Of course, there are always countercases, like how Figma spent nearly 3 years iterating on the first release, in order to get the performance of their collaborative, web-based editor right, to give users a fun “wow moment.” Worth noting that Figma is a product the dev team used continuously while developing it, getting lots of testing during the building phase.  We cover Figma’s engineering culture in a deep dive.

Consider alpha and beta testing at larger companies. Alpha and beta testing is about giving customers access to unfinished, less stable versions of a product. “Alpha” usually refers to a latest build that has had little to no QA testing. “Beta” versions have had some testing, but not as much as a full release.

Rebecca Holm Ring shares how this worked at Spotify:

“There’s an Alpha testing, and Beta testing program, where external users are on the master branch and release branch respectively, and are expected to report any issues they encounter. With these versions of Spotify, it is also possible for a user to report a bug directly from the app, and logs will be attached as the bug report will be automatically added in JIRA.”

Automation: testing and code analysis. Unit tests, integration tests, end-to-end-tests, and other automated tests, are great ways to catch regressions, which is a software bug introduced into a feature after the feature was working correctly; the feature has ‘regressed’ into a faulty state.

This is true for static code analysis and other tools that automate quality assurance. We cover more on these methods in Shipping to production and QA approaches across the industry.

Code reviews. These serve multiple purposes, offering a second pair of eyes to double check code, spread knowledge, and follow not-yet-automated conventions, and more. Catching bugs before they make it into the codebase is an occasional side effect.

Even so, bugs can easily slip through code reviews, which are nowhere near a perfect way to defend against shipping bugs and regressions.

Define what a bug is. Users often report “bugs” when they mean missing features, so it can be helpful for teams to agree what a bug is and how to categorize them. In general, a bug is a flaw that results in a software product behaving incorrectly. Categorizations can be granular, like splitting bugs into concurrency bugs, syntax ones, arithmetic, logical errors, human errors and so on. 

The simplest categorization is to split bugs into functional ones, when the behavior of the software is clearly wrong, and non-functional ones, when a bug is revealed in things like a system slowing down, increased latency, and other harder-to-spot issues.

It might be helpful to devise your own categorizations, based on the type of bugs you observe, and in a way that’s helpful for your product and organization.

2. Users reporting bugs

Gathering bugs can be a great source of data, providing a sense of product quality for feedback to teams, the organization, or company. However, data quality depends on how good the bug reporting process is – and how likely people are to report bugs!

Great reports and data come from simple, suitable processes. Features of useful bug reports:

Useful metadata (e.g. version, device, system metrics)

Relevant context (e.g. on mobile while connected to bluetooth speaker and poor connectivity, on a server in this region during lunch hour, on a debug build with these feature flags active, etc)

Straightforward to reproduce, or have reproduction steps

Reported by users who trust a reported bug will be fixed

Bad reports can create extra work and poor bug reporting processes can cause people to not commit to recording issues in the first place, and a spiral is created of deteriorating product quality, with the engineering team clueless of how bad things are.

To avoid an outcome like that, here are some processes tech companies use to support good bug reporting processes.

Make it easy to create quality bug reports. Walter de Bruijn, Head of Engineering Productivity at Miro suggests this is critical:

“The easier the reporting is, the more you will learn. For internally discovered bugs an internal #bugs Slack channel can go a long way. 

One of my best recommendations is that you start logging bugs properly: you can’t manage what you can’t see. You need a good, well-documented, and known process for reporting bugs and a follow-up.”

QA leader Rebecca Frost on why quality bug reports count:

“Capture meaningful metadata on your bugs now to help with insights later. 

Make sure your funnel for incoming bugs is helping not hurting you. If bugs are coming in from all over and the quality of the info on the bugs is a mess, consider routing bug reports through one team or group.”

Make the reporting process accessible. If creating a bug report is too complicated, it discourages reporting. There are ways to make it accessible:

Don’t make less technical users use unfamiliar tech jargon

Make it easy to find version numbers and other data required for bug reports. Even better, collect it automatically

Consider having engineers help support teams to know which questions to ask to get the right data for reports

A example of effective engineering support is at SF-based scaleup Ontra, as shared by director of engineering, Ryan Hanni:

“We have an Engineering Support team that sits between Product Support/Customer Success and R&D, which helps bridge the customer with R&D teams potentially further away from that context.”

Scaling bug reporting processes

There’s no one process that works best everywhere. Here are some common approaches by company size:

Smaller companies and startups: bug reports are usually simple, and the reporting process is lean because time is precious and knowledge is dense. Such workplaces are small enough that most tech folks can keep tabs on what’s happening, and people can submit bug reports pretty easily. 

There’s rarely a need for formal processes. Here are some efficient, less formal ones:

Set up a #bug-reports channel on Slack/Teams or other chat product

Use a #bugs tag in the ticket system of choice to keep track of issues

Prefix bug fix pull requests or commits with [Bugfix]

… and similar, simple but effective conventions that anyone can start using

Mid-sized companies and scaleups: process matters more, and these places are big enough for it to be wasteful for everyone to keep tabs on reported bugs. There are also more bug reports, and it’s a time waster to get repeated information and metadata in bug reports. 

Bug report templates and processes also matter. Good onboarding and documentation for processes and standards for bugs can have a big impact on efficiency.

Large companies: investing in automated processes is worthwhile due to the size and nature of the business:

Noisy bug reporting means lots of duplicate reports

“JIRA ping pong”: tickets created in JIRA are continually reassigned between teams and end up ‘returned to sender’

Time wasted discussing bugs: lots of overhead on debating whether a bug is a feature request, who’s best placed to fix it, etc

Here’s a good example of what JIRA ping pong looks like. Engineering manager Rebecca Holm Ring shares how it plays out a larger company:

“Bugs showing up in one user-facing feature could be caused by a number of different teams that own the deeper layers of the tech stack. Still, reports would always first land on the user facing team!

There would often be a game of ping pong, where bugs got sent back-and-forth between different teams, and sometimes found the right owners. In turn, these owners didn’t feel it was a high enough issue for them to fix. However, for the user facing team, it would often be a higher priority to get it resolved. As many different teams were involved in building the Spotify app, conflicting priorities often made bugs linger longer than they should have, and occasionally required intervention from the higher-ups to get these bugs prioritized by the owning team.” 

At larger companies, some things can help deal with an ever-growing pile of bug reports, and improve processes and tooling:

Automation across the bug reporting process

Automated data collection for gathering context

Bug reporting templates for internal use

Automatic code/team attribution for inbound reports

Ryan Hanni, director of engineering at Ontra, shares examples of manual and mostly-automated processes he’s implemented at different stages of an org’s lifecycle:

Manual process: 

Use a bug template in Trello, JIRA, or similar 

Notify Eng Support and copy the bug description to their slack channel 

→ Eng support aids in determining severity, and the team takes it from there

→ Eng Director uses a Google Spreadsheet to track all bugs, and includes details like report date and links

→ Eng Director does monthly follow ups on bugs, and adds details to the spreadsheet, like the timestamp of a fix. This spreadsheet is used for calculating stats like time-to-resolution for DORA metrics.

Partially Automated: 

Use a bug template in Trello, JIRA, or similar. 

With a press of a button, automatically post to Slack

→ Eng Support 'triages' the bug with the dev team in a thread

→ For high or medium severity bugs, support copies the details to an Airtable form and submits it 

→ The form submitted uses automation to:

Post to Slack

Send an email to internal google groups for stakeholders and departments

Automatically adds bug information into a Airtable columns

→ Partially automated reporting:

Metrics for ‘Reported at’ date and severity are automatically filled in

Metrics are filed on a monthly basis to generate stats like DORA metrics and cycle time to fix (Time to Respond, Time to Fix, Total Time In Prod, etc). 

→ When a bug is fixed, teams reply to the automated email detailing when it will be released to customers

Mostly Automated: 

Use a bug template in Trello, JIRA, or similar. 

→Use automation to send a message to Eng Support slack. 

→ Triage happens with the Dev Team and Eng Support

→ Work moves through Jira columns (Kanban style), into a Done column 

→ A release to production happens and the Jira card gets an automatic “released at” date

→ After a release to Prod, Jira sends a slack/email message with issue details to close the loop for a resolved bug of a given severity.

→ Reporting occurs by generating reports in JIRA. Run reports with all relevant metrics to the org to get information like cycle time, time to fix, response time, etc

→ Visuals for share outs on the bug fixing process is manually generated in Google Sheets

3. Bug triage

You now have a process for the bug reports to flow in, so the next step is to figure out which ones are critical, which are duplicates, and which ones not to bother with. Here are some common approaches:

Read more

Tuesday, 01. October 2024

Mike Jones: self-issued

OpenID Connect specifications published as ISO standards

I’m thrilled to report that the OpenID Connect specifications have now been published as ISO/IEC standards. They are: ISO/IEC 26131:2024 — Information technology — OpenID connect — OpenID connect core 1.0 incorporating errata set 2 ISO/IEC 26132:2024 — Information technology — OpenID connect — OpenID connect discovery 1.0 incorporating errata set 2 ISO/IEC 26133:2024 — […]

I’m thrilled to report that the OpenID Connect specifications have now been published as ISO/IEC standards. They are:

ISO/IEC 26131:2024 — Information technology — OpenID connect — OpenID connect core 1.0 incorporating errata set 2 ISO/IEC 26132:2024 — Information technology — OpenID connect — OpenID connect discovery 1.0 incorporating errata set 2 ISO/IEC 26133:2024 — Information technology — OpenID connect — OpenID connect dynamic client registration 1.0 incorporating errata set 2 ISO/IEC 26134:2024 — Information technology — OpenID connect — OpenID connect RP-initiated logout 1.0 ISO/IEC 26135:2024 — Information technology — OpenID connect — OpenID connect session management 1.0 ISO/IEC 26136:2024 — Information technology — OpenID connect — OpenID connect front-channel logout 1.0 ISO/IEC 26137:2024 — Information technology — OpenID connect — OpenID connect back-channel logout 1.0 incorporating errata set 1 ISO/IEC 26138:2024 — Information technology — OpenID connect — OAuth 2.0 multiple response type encoding practices ISO/IEC 26139:2024 — Information technology — OpenID connect — OAuth 2.0 form post response mode

I submitted the OpenID Connect specifications for publication by ISO as Publicly Available Specifications (PAS) for the OpenID Foundation in December 2023. Following the ISO approval vote, they are now published. This should foster even broader adoption of OpenID Connect by enabling deployments in jurisdictions around the world that have legal requirements to use specifications from standards bodies recognized by international treaties, of which ISO is one.

Before submitting the specifications, the OpenID Connect working group diligently worked through the process of applying errata corrections to the specifications, so that the ISO versions would have all known corrections incorporated.

Having successfully gone through the ISO PAS submission process once, the OpenID Foundation now plans to submit additional families of final specifications for publication by ISO. These include the FAPI 1.0 specifications, and once they’re final, the eKYC-IDA specifications and FAPI 2.0 specifications.

Thanks to all who helped us achieve this significant accomplishment!

Tuesday, 01. October 2024

Michael Ruminer

Google’s NotebookLM, RAG and Then Some

I played around just a little bit with Google’s NotebookLM. It’s a Retrieval-Augmented Generation (RAG) tool. One thing it does is convert your content into a podcast format and it provides an impressive-sounding result. It's a little superficial on the content synthesis and definitely gimmicky, but it is a podcast after all. Its other features are your typical RAG chat interface. All I can say is

I played around just a little bit with Google’s NotebookLM. It’s a Retrieval-Augmented Generation (RAG) tool. One thing it does is convert your content into a podcast format and it provides an impressive-sounding result. It's a little superficial on the content synthesis and definitely gimmicky, but it is a podcast after all. Its other features are your typical RAG chat interface. All I can say is, give it a try, or check out my example effort.

Here is a link to a single source document I provided it- Self Sovereign Identity A Systematic Review Mapping and Taxonomy and the link to the SSI Paper Podcast results.

Monday, 30. September 2024

Jon Udell

Making a Vote Forward checklist

In How and why to write letters to voters I discussed Vote Forward, my favorite way for those of us who aren’t in swing states to reach out to voters in swing states. The site works really well for adopting batches of voters, and downloading packets of form letters. As I close in on 1000 … Continue reading Making a Vote Forward checklist

In How and why to write letters to voters I discussed Vote Forward, my favorite way for those of us who aren’t in swing states to reach out to voters in swing states. The site works really well for adopting batches of voters, and downloading packets of form letters. As I close in on 1000 letters, though, I’m finding it isn’t great for tracking progress at scale. Here’s how my dashboard page looks.

With 50 bundles in play, many of which are farmed out to friends and neighbors who are helping with the project, it’s become cumbersome to keep track of which bundles are prepped (ready to mail) or not. Here is the checklist I needed to see.

VoteForward Dashboard Report mmorg: 1-UNPREPPED r23Pp: 2-UNPREPPED v9Kbo: 3-UNPREPPED wLMPw: 4-UNPREPPED 24L4o: 5-PREPPED 4nNnj: 6-PREPPED 5rQmV: 7-PREPPED ... YV4dL: 48-PREPPED zKjne: 49-PREPPED ZrKJz: 50-PREPPED

If you’re in the same boat, here’s a piece of code you can use to make your own checklist. It’s gnarly, if you aren’t a programmer I advise you not even to look at it, just copy it, and then paste it into your browser to have it open a new window with your report.

Vote Forward checklist maker (expand to copy) javascript:(function(){ // First part: Adjust height of divs with inline styles document.querySelectorAll('div[style]').forEach(div => { let inlineStyle = div.getAttribute('style'); if (inlineStyle.includes('position: relative')) { div.style.height = '20000px'; // Set the height to 20000px } }); // Introduce a delay before processing the list of items setTimeout(() => { const items = document.querySelectorAll('li.bundle-list-item.individual'); let dataList = []; // Iterate over the items to capture data-testid and ID items.forEach(item => { let dataTestId = item.getAttribute('data-testid'); // Use the id attribute of the input element to extract the ID const toggleInput = item.querySelector('input.slide-out-toggle'); const toggleId = toggleInput ? toggleInput.getAttribute('id') : ''; // Extract the ID part from the toggleId pattern "toggle-24L4o-PREPPED" const id = toggleId ? toggleId.split('-')[1] : 'ID not found'; // Remove "bundle-" and the number part from dataTestId, keeping only "PREPPED" or "UNPREPPED" dataTestId = dataTestId.split('-').pop(); // Extract only the "PREPPED" or "UNPREPPED" part // Push the data into the array dataList.push({ dataTestId, id }); }); // Sort first by whether it's PREPPED or UNPREPPED (descending for UNPREPPED first), // then by the ID within each group dataList.sort((a, b) => { if (a.dataTestId.includes("PREPPED") && b.dataTestId.includes("UNPREPPED")) { return 1; // UNPREPPED comes before PREPPED } else if (a.dataTestId.includes("UNPREPPED") && b.dataTestId.includes("PREPPED")) { return -1; } // Sort by ID if they belong to the same category return a.id.localeCompare(b.id); }); // Prepare the output string let output = ''; dataList.forEach((item, index) => { output += `${item.id}: ${index + 1}-${item.dataTestId}\n`; }); // Open a new window with the output in a text area for easy copying let newWindow = window.open('', '', 'width=500,height=500'); newWindow.document.write('<html><body><h2>VoteForward Dashboard Report</h2><pre>' + output + '</pre></body></html>'); newWindow.document.close(); }, 2000); // Adjust delay as needed })();

Here are instructions for Chrome/Edge, Safari, and Firefox. You might need to tell your browser to allow the popup window in which it writes the report.

Chrome/Edge: Open the VoteForward dashboard in your browser. Open the developer console: Windows/Linux: Press Ctrl + Shift + J. Mac: Press Cmd + Option + J. Paste the code into the console. Press Enter to run the code. Firefox: Open the VoteForward dashboard in your browser. Open the developer console: Windows/Linux: Press Ctrl + Shift + K. Mac: Press Cmd + Option + K. Paste the code into the console. Press Enter to run the code. Safari: Open the VoteForward dashboard in your browser. Enable the developer console (if it’s not already enabled): Go to Safari > Preferences. Click the Advanced tab. Check “Show Develop menu in menu bar” at the bottom. Open the developer console: Press Cmd + Option + C. Paste the code into the console. Press Enter to run the code.

It would be nice to have this as a built-in feature of the site but, as we come down to the wire, this may be a helpful workaround.

Thanks, again, to the Vote Forward team for all you do! It’s a great way to encourage voter turnout.

Friday, 27. September 2024

Bill Wendels Real Estate Cafe

Talk to Compensation Coach before signing showing agreement to maximize savings

#LegalHackers: Know about the new role emerging at the start of real estate transactions because homebuyers are now being asked to sign an agreement before… The post Talk to Compensation Coach before signing showing agreement to maximize savings first appeared on Real Estate Cafe.

#LegalHackers: Know about the new role emerging at the start of real estate transactions because homebuyers are now being asked to sign an agreement before…

The post Talk to Compensation Coach before signing showing agreement to maximize savings first appeared on Real Estate Cafe.

Thursday, 26. September 2024

Wrench in the Gears

What If The Totalitarian Game Is Actually A Participatory Play? Musings on CogNarr, Eusociality, and Synanon

Yesterday I put together a 40-minute video in which I talk about Karl Friston’s “Free Energy Principle,” a theory developed in the 1990s that centers the unique narrative world models we hold internally and how those models in consciousness are shaped by sensory input, the incoming information hurtling towards us through the feed every day. [...]

Yesterday I put together a 40-minute video in which I talk about Karl Friston’s “Free Energy Principle,” a theory developed in the 1990s that centers the unique narrative world models we hold internally and how those models in consciousness are shaped by sensory input, the incoming information hurtling towards us through the feed every day. I relate Friston’s work to a new effort by John Boik, a Houston-based bioengineer with a background in Oriental medicine who is now pursuing open source software in AI assisted direct democracy and community currency programs. If you’ve been following my work around Michael Levin’s research you can probably see the connection between Boik’s former career in cancer research and cooperative digital money systems – morphogenetic fields linked to expanding cognitive light cones.

In a recent presentation for the Active Inference Institute, based at UC Davis in Daniel Ari Friedman’s Entomology Lab (Deborah Gordon, Stanford ant behavior expert is an advisor), Boik laid out his plans for CogNarr, a system that would structure human “stories” in ways that would be intelligible to computers and facilitate collaboration at a scale of a half dozen people to several million. This parallels Thomas Seeley’s work on distributed decision-making described in his book “Honeybee Democracy.”

I am settling in here in Hot Springs and I met a lovely elder who placed in my hands a memoir written by her son who is around my age. It describes his educational journey as a divergent learner and his family’s decade-long involvement with the “utopian” community Synanon based in Marin County, CA with ties to influential group psychologists and likely an experiment tied to the Human Ecology / MK Ultra Project. The main campus was on the site of an old lodge built by Marconi for wireless telegraphy.

While founded as an addiction treatment program that used attack therapy, “The Game,” the community later attracted intellectuals who were interested in a collective “utopian” lifestyle. I believe Synanon was a testbed for exploring the opportunities and limits of a hive / colony-oriented communal behavior. In other words Synanon was an ant computer lab. I get the feeling that while a lot of media attention was paid to increasingly unhinged adults with weapon stockpiles, including a rattlesnake placed in a lawyer’s mailbox, the primary area of investigation may have been the children.

It’s funny, with the cooler weather I’ve been enjoying my sunporch, but I’ve found that I have to keep an eye on the sugar ants. There are regularly 3-4 foragers exploring my Ikea Kallax furniture. One afternoon I got preoccupied with sewing and left a plate of sliced apples unattended for a few hours. Yowza – pheromonic stigmergy in action! It was shocking, but also quite fascinating to see how the superorganism works up close and personal. This video that is part of the Santa Fe Institute podcast offers helpful insights into this dynamic: “The Physics of Collectives.” I keep pushing the idea of the ant computer in front of people, because we are trained into linguistic and conceptual boxes that fuel polarity – libertarianism vs. socialism, etc. etc. when most of the roads are all using custom stories to take us to the same place – Conway’s Game of Life, counting down the iterations until we get the gliders coming on.

Abraham Maslow was involved with Synanon as well as being affiliated with Julius Stulman, Fritz Kunz, and Ervin Laszlo via the Foundation for Integrated Education. Maslow contributed a chapter to Stulman and Laszlo’s book “Emergent Man.” Synanon shut its doors in the 1990s, but spun out other programs including an abusive pioneer in the “troubled teen” space CEDU. Paris Hilton is an alumna of that program. Rory Kennedy, Bobby’s youngest sister, produced a documentary on Synanon that was released this spring by HBO. I speculate that this documentary as well as last year’s Netflix expose on San Patrignano are pre-bunking efforts to neutralize resistance to the institutional “wellness,” addiction and mental health treatment programs Bobby has planned for rural areas across the United States. My gut tells me these centers, like refugee camps, will be testbeds for tokenized eusocial behavioral experiments leading to advancements in collective decision making using emerging technologies. Nothing new under the sun, only this time we have nano-photonics in the mix to make social signaling more seamless and intuitive.

Here’s my video:

There were some good questions and comments on this video. Given how those tend to disappear, I chose to transcribe one exchange that provides some additional insight into how I’ve been thinking about things lately. I know my theory may sound far-fetched, but remember, consciousness is a coherence engine. I’ve been pulling in A LOT of disparate information for years. Trying to weave it into a meaningful narrative is tricky. Certainly the plot line that seems to be unfolding is NOTHING I would have ever expected. I’m not saying I’m correct, but I think there is enough information of concern that my thoughts do merit public consideration with respect to ethics, informed consent, and guard rails. The past four years have given me a crash course in the subtle practice of knowledge management refined by well-trained and well-funded social scientists and psychologists. All that I’m asking is that we actually engage in information streams outside of the ones being poured down our throats by mainstream media, academic gatekeepers, and social media feeds. But at the end of the day, I am proposing a pretty far out story. It makes sense that it’s going to be near to impossible to get much traction. Anyway, if you want to immerse yourself in the information fields that brought me to this place, I have linked a ton of my Kumu.io maps. Click away. I’m eager to hear what you come up with. 

Commenter One:

“What I see is the constant reinforcement of being in mind, abdication of physical being, being in your body and developing a relationship with our humanness is the most important connection. I wonder if we stay truly connected to our human-manifestation would that stop people going automatic and getting lost with narrative.”

Commenter Two:

“This right here. There aro a whole slew of us that have been guided towards esoteric info to the point that we’ve started commenting “who’s here in 24?” To 10 plus year old videos. They think the algorithm has discovered that people really like this….I just think this is a test group of heavy users of you tube. A particular…group of people who would be prone to action in the physical realm. It’s a test for how well it neutralizes them. Doesn’t matter to them if it’s real or empowering or helpful or healing….just does it prevent action by a physical being and bend them to spread denying the physical. Many in that group have kids…many may be teaching their children to place the physical 2nd behind mental. And if that’s the case – why not climb into the box and never leave?”

Commenter Three:

“I would be curious to know more about this. So, to clarify, YouTube is pushing videos on a group of people to see if they will remain docile? I’ve often wondered about certain spiritual viewpoints, and if they serve to keep people from sticking up for themselves. Any specifics you have on this subject would be fascinating to me, thanks.”

Alison’s Comment:

“Robert Bosnak’s take is that apocalypse narratives are quite generative. I do see the ascension storyline as running parallel with the Christian imminent apocalypse storyline. Based on experiences I’ve had over the past several years as well as natural cycles, I sense that we don’t have a grasp of what spacetime really is and that it can loop back and feature recursion. If that is the case, the linear narratives of some transformative end point seem likely socially constructed for some purpose I can’t conjecture about. I’m skeptical of both. If you try on the theory I’ve been developing around heart-soul-based nested and networked human massive parallel processing with access to the field through quantum biology and companion microbiome management, then the system would need two phases – a foraging out-of-body ego-death phase where individuals or groups go into the field to retrieve information which is then processed in-vivo with support from bacterial colonies that live in and on us. I sense that both are required, but training and practice will be needed so people can do it as second nature.

Again, in both apocalypse narratives, the storyline and teaming aspect is super important. Consider the industries that have been built up around both from New Age TV and webinars and books, etc. to Christian megachurches and of course the Vatican’s deep dive into social impact and biotech and Oral Roberts setting up Bible verse magic AI mirrors in student dorms. It never occurred to me that “social impact” was meant to be a hub for ongoing remote cognitive-biological monitoring (now for “wellness”) that will enable society to become a noetic superorganism available for spiritual on-demand information processing. Just like I believe the Georgia Guidestones were an engineered memetic trope to focus negative energy and fear. I also think the apocalypse stories are meant to structure consciousness so that when the intended program rolls out it can be embedded in a dramatic arc each particular groomed group is already comfortable with. The question though…is it a cookbook?”

Commenter One Follow Up:

“So with the embodied imagination, which sounds a bit like shamanic journeying, it is the mind which is the ‘driver’ to facilitate this third state which is embodied. In the mirror’d virtual world it is presumably still the original mind inhabiting this new embodiment? I wonder also what happens to the energy field of the actual heart & body in the new embodiment. Is this measured by the galvanic response also? Narrative means each group (being coalesced) can happily volunteer to inhabit this state and the body can then be harnessed and utilised.”

Alison’s Reply:

“I just wanted to state I am working through this theory and don’t have everything. buttoned up – but as I picture it, it would be the physical body and etheric body working in tandem to facilitate the information processing – two cycles. So picture a bee scouting a site for a new hive. Bees do this “waggle dance” to communicate what they’ve found and make a case to their peers about why their location is good. Then, by consensus (think tokenized democracy) the bees come to a decision. So maybe we are using our etheric self to forage for information in the field, and we bring it back to “the collective,” but that collective isn’t a hive of bees, but rather a cosmos of microbiota and mitochondria who communicate not through dance vibrations, but perhaps through photonics in the fascia? Now picture all of us as agents linked in a parallel processing system via frequency facilitated by electroceuticals and liposomes, etc. in our “wellness” supplements. I’m kind of spitballing here and pardon me if I’m misinterpreting your framing, but I don’t think it’s – hey send your consciousness out into the ether and let the system use your physical body while you are gone. There is the issue of embodiment, narrative coherence, umwelt, and qualia. My logical self says that individuals need to maintain their integrity as agents over time in order to be productive in the collective decision making process. I think they want to be able to nest individual experiences and then aggregate them at a scale relevant to the particular “cognitive light cone” task. Think about ants – they are agents working within a social structure that facilitates complex emergence even if the individual ants have no clue what is going on at levels about their pay grade. As I have been researching Julian Gresser’s “intelligent heart” it seems that the torus field is useful for processing 4-D information. So that may be another reason, beyond the role of the microbiome, to integrate knowledge acquired through astral projection – to integrate the information in a space outside regular 3-D existence.

Also, Bosnak is very hip on group dreaming in VR “Asclepian” healing caves. My intuition tells me that tokens and the protocol layer could be an interdimensional language of manifestation. This paper shared with me today by Stephers has some interesting thoughts on consciousness and projecting geometry (sacred?). https://www.frontiersin.org/journals/psychology/articles/10.3389/fpsyg.2020.01680/full

Interactive Map: https://embed.kumu.io/745bb8ce3aad470199efea9e9528807d#untitled-map?s=bm9kZS1aUERrT3Vxbg%3D%3D

I mention psychodrama and Gurdjieff in the video. Here is a map I’ve been working on related to this:

Interactive map JG Bennet and Gurdjieff’s “The Work”: https://embed.kumu.io/0bf33678ed9f4126e528c09beb40370e#untitled-map?s=bm9kZS1RdE5jbkJ0Tg%3D%3D

This map links Maslow’s theories of self-actualization with Ocean Protocol (Trent McConaghy and Jim Rutt’s Game B), token engineering, game mechanics, sustainability, and AGI:

Interactive Map – Maslow Self-Actualization: https://embed.kumu.io/4011b2b1901acf8b24cbb1119bbb9e1c#untitled-map?s=bm9kZS1ObjBud3VEdg%3D%3D Source: https://embed.kumu.io/1ff16f690770c1d381e267f258358a32#untitled-map?s=bm9kZS1mVjhCN2ZOUw%3D%3D Source: https://archive.org/details/emergentmanhisch00stul/mode/2up Source: https://archive.org/details/emergentmanhisch00stul/mode/2up Source: https://archive.org/details/emergentmanhisch00stul/mode/2up

Source: https://embed.kumu.io/51fe5ee0c16765f2c748936cc86c28ee#untitled-map?s=bm9kZS1JbG83a1dOZQ%3D%3D Source: https://web.archive.org/web/20210531074023/https://littlesis.org/oligrapher/6930-arizona-state-games-and-education Source: https://embed.kumu.io/035ea6860f513bab36fa1a9815d93a74#untitled-map?s=bm9kZS1ENDF4aWZZZA%3D%3D Source: https://embed.kumu.io/d393e288b03d6a5ef05fc9bc6e51a6a6#untitled-map?s=bm9kZS05SndwcVNqQg%3D%3D Source: https://embed.kumu.io/f77a772b1d3ed24cf4140d0a3d270348#untitled-map?s=bm9kZS1HR0M1SWphVg%3D%3D Source: https://principledsocietiesproject.org/about_psp

Interactive map: https://embed.kumu.io/f662cd5969e9debc347b10601a5e3282#untitled-map?s=bm9kZS1Tc2dLUmd3Uw%3D%3D Source: https://embed.kumu.io/097cefb708139ffa65e906e397da3824#untitled-map?s=bm9kZS1hVzVyR1FaeA%3D%3D Source: https://embed.kumu.io/89566f995693843319e1b4970dac9dd1#untitled-map?s=bm9kZS1VSk9pY0JYTA%3D%3D

Lee Eyre, Boik’s collaborator on these papers, worked for Orange (ambient smart environments) for ten years. The video below is how the world looks when everything is run on a permission-based smart contract. This scenario presumes all agents have a digital identifier, which brings in Web3, blockchain, and spatial computing. Watch this with an eye towards interactions feeding into a personal story – qualia, umwelt.

LEDDA (Leda) is an interesting name / acronym choice here. 

Source: https://www.youtube.com/watch?v=z9ZCjd2rqGY Source: https://www.researchgate.net/lab/Active-Inference-Institute-Daniel-Ari-Friedman Screenshot

 

 

Source: https://embed.kumu.io/f03f97508c7993eef8aefa20cd265e98#untitled-map?s=bm9kZS1SUU1ZTkdvdA%3D%3D

 

Source: https://lodgeatmarconi.com/the-lodge/ Source: https://web.archive.org/web/20210110011432/https://littlesis.org/oligrapher/4196-adl-iot-education Source: https://web.archive.org/web/20201025032840/https://littlesis.org/oligrapher/5647-blockchain-and-transmedia-learning Source: https://web.archive.org/web/20210104142455/https://littlesis.org/oligrapher/6018-kneomedia-gamified-edu-tainment Source: https://web.archive.org/web/20210531074023/https://littlesis.org/oligrapher/6930-arizona-state-games-and-education Source: https://littlesis.org/oligrapher/5084-vr-gaming-surveillance-human-capital Source: https://web.archive.org/web/20201204194140/https://littlesis.org/oligrapher/5757-dassault-digital-twin-simulations-blockchain-id Source: https://web.archive.org/web/20210329114848/https://littlesis.org/oligrapher/6637-california-dyslexia-gamification Source: https://web.archive.org/web/20210407065329/https://littlesis.org/oligrapher/6678-oakland-dyslexia-screening-pilot Source: https://web.archive.org/web/20210129211602/https://littlesis.org/oligrapher/6317-biotech-microbiome-blockchain Source: https://web.archive.org/web/20201207235929/https://littlesis.org/oligrapher/5719-produce-rx-food-pay-for-success-medicaid Source: https://web.archive.org/web/20201224191109/https://littlesis.org/oligrapher/6016-wearable-health-tech-impact-investing

 

 

 

 

 


The Pragmatic Engineer

The Pulse #108: Elasticsearch unexpectedly goes open source again

Also: Amazon cuts the number of managers, engineers critiquing YouTube’s biggest critic, and key people keep quitting OpenAI.

The Pulse is a series covering insights, patterns, and trends within Big Tech and startups. Notice an interesting event or trend? Send me a message.

Today, we cover:

Industry pulse. Meta previews next-gen AI glasses, X probably needs to pay severance to engineers it laid off, on-prem datacenter threat for AWS, tech jobs drying up, Arc’s embarrassing secur…

Read more

Wednesday, 25. September 2024

The Pragmatic Engineer

AI tools for software engineers, but without the hype – with Simon Willison (co-creator of Django)

Simon is one of the best-known software engineers experimenting with LLMs to boost his own productivity: he’s been doing this for more than three years, blogging about it in the open.

The first episode of The Pragmatic Engineer Podcast is out. Expect similar episodes every other Wednesday. You can add the podcast in your favorite podcast player, and have future episodes downloaded automatically.

Listen now on Apple, Spotify, and YouTube.

Brought to you by:

Codeium: ​​Join the 700K+ developers using the IT-approved AI-powered code assistant.

TLDR: Keep up with tech in 5 minutes

On the first episode of the Pragmatic Engineer Podcast, I am joined by Simon Willison.

Simon is one of the best-known software engineers experimenting with LLMs to boost his own productivity: he’s been doing this for more than three years, blogging about it in the open.

Simon is the creator of Datasette, an open-source tool for exploring and publishing data. He works full-time developing open-source tools for data journalism, centered on Datasette and SQLite. Previously, he was an engineering director at Eventbrite, joining through the acquisition of Lanyrd, a Y Combinator startup he co-founded in 2010. Simon is also a co-creator of the Django Web Framework. He has been blogging about web development since the early 2000s.

In today’s conversation, we dive deep into the realm of Gen AI and talk about the following: 

Simon’s initial experiments with LLMs and coding tools

Why fine-tuning is generally a waste of time—and when it’s not

RAG: an overview

Interacting with GPTs voice mode

Simon’s day-to-day LLM stack

Common misconceptions about LLMs and ethical gray areas 

How Simon’s productivity has increased and his generally optimistic view on these tools

Tips, tricks, and hacks for interacting with GenAI tools

And more!

I hope you enjoy this epsiode.

In this episode, we cover:

(02:15) Welcome

(05:28) Simon’s ‘scary’ experience with ChatGPT

(10:58) Simon’s initial experiments with LLMs and coding tools

(12:21) The languages that LLMs excel at

(14:50) To start LLMs by understanding the theory, or by playing around?

(16:35) Fine-tuning: what it is, and why it’s mostly a waste of time

(18:03) Where fine-tuning works

(18:31) RAG: an explanation

(21:34) The expense of running testing on AI

(23:15) Simon’s current AI stack 

(29:55) Common misconceptions about using LLM tools

(30:09) Simon’s stack – continued 

(32:51) Learnings from running local models

(33:56) The impact of Firebug and the introduction of open-source 

(39:42) How Simon’s productivity has increased using LLM tools

(41:55) Why most people should limit themselves to 3-4 programming languages

(45:18) Addressing ethical issues and resistance to using generative AI

(49:11) Are LLMs are plateauing? Is AGI overhyped?

(55:45) Coding vs. professional coding, looking ahead

(57:27) The importance of systems thinking for software engineers 

(1:01:00) Simon’s advice for experienced engineers

(1:06:29) Rapid-fire questions

Some takeaways:

If you are not using LLMs for your software engineering workflow, you are falling behind. So use them! Simon outlined a bunch a of reasons that hold back many devs from using these tools – like ethical concerns, or energy concerns. But LLM tools are here to stay, and those who use them get more productive.

It takes a ton of effort to learn how to use these tools efficiently. As Simon puts it: “You have to put in so much effort to learn, to explore and experiment and learn how to use it. And there's no guidance.” Also, in related research we did in The Pragmatic Engineer about AI tools, with about 200 software engineers responding, we saw some similar evidence. Those who have not used AI tools for 6 months, were more likely to be negative in their perception of these. In fact, a very common feedback from engineers not using these tools was “I used it a few times, but it didn’t live up to my expectations, and so I’m not using it any more”

Use local models to learn more about LLMs. Running local models has two bigger benefits:

Tou figure out how to do these! It’s less complicated than one would think, thanks to tools like HuggingFace. Go and play around with them, and try out a smaller local model.

You learn a LOT more about how LLMs work, thanks to local models being less capable. So it feels less “magic”. As Simon said, “ I think it's really useful to have a model hallucinate at you early because it helps you get that better mental model of, of, of what it can do. And the local models hallucinate wildly.”

Where to find Simon Willison:

• X: https://x.com/simonw

• LinkedIn: https://www.linkedin.com/in/simonwillison/

• Website: https://simonwillison.net/

• Mastodon: https://fedi.simonwillison.net/@simon

Referenced:

• Simon’s LLM project: https://github.com/simonw/llm

• Jeremy Howard’s Fast Ai: https://www.fast.ai/

• jq programming language: https://en.wikipedia.org/wiki/Jq_(programming_language)

• Datasette: https://datasette.io/

• GPT Code Interpreter: https://platform.openai.com/docs/assistants/tools/code-interpreter

• Open Ai Playground: https://platform.openai.com/playground/chat

• Advent of Code: https://adventofcode.com/

• Rust programming language: https://www.rust-lang.org/

• Applied AI Software Engineering: RAG: https://newsletter.pragmaticengineer.com/p/rag

• Claude: https://claude.ai/

• Claude 3.5 sonnet: https://www.anthropic.com/news/claude-3-5-sonnet

• ChatGPT can now see, hear, and speak: https://openai.com/index/chatgpt-can-now-see-hear-and-speak/

• GitHub Copilot: https://github.com/features/copilot

• What are Artifacts and how do I use them?: https://support.anthropic.com/en/articles/9487310-what-are-artifacts-and-how-do-i-use-them

• Large Language Models on the command line: https://simonwillison.net/2024/Jun/17/cli-language-models/

• Llama: https://www.llama.com/

• MLC chat on the app store: https://apps.apple.com/us/app/mlc-chat/id6448482937

• Firebug: https://en.wikipedia.org/wiki/Firebug_(software)#

• NPM: https://www.npmjs.com/

• Django: https://www.djangoproject.com/

• Sourceforge: https://sourceforge.net/

• CPAN: https://www.cpan.org/

• OOP: https://en.wikipedia.org/wiki/Object-oriented_programming

• Prolog: https://en.wikipedia.org/wiki/Prolog

• SML: https://en.wikipedia.org/wiki/Standard_ML

• Stabile Diffusion: https://stability.ai/

• Chain of thought prompting: https://www.promptingguide.ai/techniques/cot

• Cognition AI: https://www.cognition.ai/

• In the Race to Artificial General Intelligence, Where’s the Finish Line?: https://www.scientificamerican.com/article/what-does-artificial-general-intelligence-actually-mean/

• Black swan theory: https://en.wikipedia.org/wiki/Black_swan_theory

• Copilot workspace: https://githubnext.com/projects/copilot-workspace

• Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems: https://www.amazon.com/Designing-Data-Intensive-Applications-Reliable-Maintainable/dp/1449373321

• Bluesky Global: https://www.blueskyglobal.org/

• The Atrocity Archives (Laundry Files #1): https://www.amazon.com/Atrocity-Archives-Laundry-Files/dp/0441013651

• Rivers of London: https://www.amazon.com/Rivers-London-Ben-Aaronovitch/dp/1625676158/

• Vanilla JavaScript: http://vanilla-js.com/

• jQuery: https://jquery.com/

• Fly.io: https://fly.io/

Production and marketing by https://penname.co/. For inquiries about sponsoring the podcast, email podcast@pragmaticengineer.com.

Monday, 23. September 2024

Damien Bod

Implement a Geo-distance search using .NET Aspire, Elasticsearch and ASP.NET Core

This article shows how to implement a geo location search in an ASP.NET Core application using a LeafletJs map. The selected location can be used to find the nearest location with an Elasticsearch Geo-distance query. The Elasticsearch container and the ASP.NET Core UI application are setup for development using .NET Aspire. Code: https://github.com/damienbod/WebGeoElasticsearch Setup For … … Contin

This article shows how to implement a geo location search in an ASP.NET Core application using a LeafletJs map. The selected location can be used to find the nearest location with an Elasticsearch Geo-distance query. The Elasticsearch container and the ASP.NET Core UI application are setup for development using .NET Aspire.

Code: https://github.com/damienbod/WebGeoElasticsearch

Setup

For local development, .NET Aspire is used to setup the two services and the HTTPS connections between the services. The services are configured in the Aspire AppHost project .

The Elasticsearch client is setup as a singleton and requires the connection configuration. This can be changed, if for example an API key is used instead. The connection URL is read from the configuration as well as the secrets.

using Elastic.Clients.Elasticsearch; using Elastic.Transport; namespace WebGeoElasticsearch.ElasticsearchApi; public class ElasticClientProvider { private readonly ElasticsearchClient? _client = null; public ElasticClientProvider(IConfiguration configuration) { if (_client == null) { var settings = new ElasticsearchClientSettings(new Uri(configuration["ElasticsearchUrl"]!)) .Authentication(new BasicAuthentication(configuration["ElasticsearchUserName"]!, configuration["ElasticsearchPassword"]!)); _client = new ElasticsearchClient(settings); } } public ElasticsearchClient GetClient() { if (_client != null) { return _client; } throw new Exception("Elasticsearch client not initialized"); } }

Create Index with mapping

The index cannot be created by adding a document because the mapping is created incorrectly using the default settings. The mapping can be created for the defined index using the Mappings extension from the Elastic.Clients.Elasticsearch Nuget package. This was added to the client project in the Aspire.Elastic.Clients.Elasticsearch package. The mapping is really simple and probably not complete for a production index, some keyword optimizations are required. The detailsCoordinates field is defined as a GeoPointProperty.

var mapping = await _client.Indices.CreateAsync<MapDetail>(IndexName, c => c .Mappings(map => map .Properties( new Properties<MapDetail>() { { "details", new TextProperty() }, { "detailsCoordinates", new GeoPointProperty() }, { "detailsType", new TextProperty() }, { "id", new TextProperty() }, { "information", new TextProperty() }, { "name", new TextProperty() } } ) ) );

The created mapping can be validated using the “IndexName”/_mapping GET request. This returns the definitions as a Json response.

https://localhost:9200/mapdetails/_mapping

Documents can be added to the Elasticsearch index using the IndexAsync method.

response = await _client.IndexAsync(dotNetGroup, IndexName, "1");

Search Query

A Geo-distance query is used to find the distance from the selected location to the different Geo points in the index. This using latitude and longitude coordinates.

public async Task<List<MapDetail>> SearchForClosestAsync( uint maxDistanceInMeter, double centerLatitude, double centerLongitude) { // Bern Lat 46.94792, Long 7.44461 if (maxDistanceInMeter == 0) { maxDistanceInMeter = 1000000; } var searchRequest = new SearchRequest(IndexName) { Query = new GeoDistanceQuery { DistanceType = GeoDistanceType.Plane, Field = "detailsCoordinates", Distance = $"{maxDistanceInMeter}m", Location = GeoLocation.LatitudeLongitude( new LatLonGeoLocation { Lat = centerLatitude, Lon = centerLongitude }) }, Sort = BuildGeoDistanceSort(centerLatitude, centerLongitude) }; searchRequest.ErrorTrace = true; _logger.LogInformation("SearchForClosestAsync: {SearchBody}", searchRequest); var searchResponse = await _client .SearchAsync<MapDetail>(searchRequest); return searchResponse.Documents.ToList(); }

The found results are returned sorted using the Geo-distance sort. This puts the location with the smallest distance first. This is used for the map display.

private static List<SortOptions> BuildGeoDistanceSort( double centerLatitude, double centerLongitude) { var sorts = new List<SortOptions>(); var sort = SortOptions.GeoDistance( new GeoDistanceSort { Field = new Field("detailsCoordinates"), Location = new List<GeoLocation> { GeoLocation.LatitudeLongitude( new LatLonGeoLocation { Lat = centerLatitude, Lon = centerLongitude }) }, Order = SortOrder.Asc, Unit = DistanceUnit.Meters } ); sorts.Add(sort); return sorts; }

Display using Leaflet.js

The ASP.NET Core displays the locations and the results of the search in a Leafletjs map component. The location closest to the center location is displayed differently. You can click around the map and test the different searches. The data used for this display is powered using the Geo-distance query.

Testing

The applications can be started using the .NET Aspire host project. One is run as a container, the other is a project. The docker container requires a Desktop docker installation on the host operating system. When the applications started, the containers need to boot up first. An optimization would remove this boot up.

Notes

Using Elasticsearch, it is very simple to create fairly complex search requests for your web applications. With a bit of experience complex reports, queries can be implemented as well. You can also use Elasticsearch aggregations to group and organize results for data analysis tools, reports. .NET Aspire makes it easy to develop locally and use HTTPS everywhere.

Links

https://www.elastic.co/guide/en/elasticsearch/reference/current/geo-point.html

https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-geo-distance-query.html

https://leafletjs.com/

https://www.elastic.co/guide/en/elasticsearch/reference/current/explicit-mapping.html

Using Elasticsearch with .NET Aspire

Sunday, 22. September 2024

Kent Bull

KERI Series: Understanding Self-Addressing Identifiers (SAID)

What is a self addressing identifier, a SAID? What does this mean and how is a SAID created and verified? This post answers these questions. We show a generalized process for calculating SAIDs and delve into the encoding format for CESR-compliant self addressing identifiers. Examples with three popular algorithms, SHA2-256, […]

What is a self addressing identifier, a SAID? What does this mean and how is a SAID created and verified? This post answers these questions. We show a generalized process for calculating SAIDs and delve into the encoding format for CESR-compliant self addressing identifiers. Examples with three popular algorithms, SHA2-256, SHA3-256, and Blake3-256, show specifics of applying the general process. This general process can be used for calculating SAIDs with other cryptographic algorithms.

For those who want to skim there are pictures below including bit diagrams that illustrate exactly what is happening.

What is a SAID?

Fundamentally, a SAID is a cryptographic digest of a given set of data and is embedded within the data it is a digest of. A CESR-style SAID pads the digest to 33 bytes and adds a type code into the padded digest to replace resulting Base64 pad characters. It looks like this:

HPJbVi6fZvGNCASDiwABn2wpQ0lI-2cR0yaoRErkD-j6

This is a SHA3-256 digest encoded in the CESR format.

What is the CESR format? It is the Base64 URL Safe encoding of the raw digest along with some front-padding of zero bits and a type code, as shown in detail below. From the above SAID, the ‘H’ character is the type code. The rest of the string is composed of Base64 URL Safe characters.

Why Base64? More Space

Why was Base64 encoding used rather than something like hex encoding? Because Base64 encoding allows maximally compact text encoding of data using a well-known encoding protocol of alphanumeric characters (0-9, a-z, A-Z, -_). As compared to hexadecimal (“hex”) encoding Base64 encodes 6 bits of data per Base64 character whereas hex encoding encodes 4 bits of data per Base64 character, so Base64 can store 50% more data in the same space compared to hex. This helps reduce bandwidth and power costs, optimizing performance overall.

Note on Hash or Digest Terminology

A note on terminology, sometimes digests are called hashes or hash values. The technical definition of the term hash refers to a hash function. Hash functions transform data into a fixed-size string. This fixed-size string is the digest, the output of a hash function.

Back to SAIDs, the fact that a SAID can be embedded in the data it is a digest of is why it is called “self addressing.” The digest is essentially a unique identifier of the data it is embedded in.

A SAID (Self-Addressing Identifier) is a special type of content-addressable identifier based on an encoded cryptographic digest that is self-referential.

Composable Event Streaming Representation ToIP Specification – Section 12.6 – Dr. Samuel M. Smith

What is a content addressable identifier? A content addressable identifier is an identifier derived from the content being stored which makes a useful lookup key in content addressable storage, such as IPFS or a key-value store database like LevelDB, LMDB, Redis, DynamoDB, Couchbase, Memcached, or Cassandra.

Embedding a digest changes the source data and hash, right?

How can the SAID digest could be accurate given that placing the SAID in the data it identifies changes the data, thus producing a different hash? The way SAIDs accomplish this is with a two step generation and embedding process.

Two step SAID generation and embedding process During SAID calculation the destination field of the SAID is filled with pound sign filler characters (“#”) up to the same length of the SAID. The digest is then calculated, encoded, and placed in the destination field.

The reverse occurs for verification of a SAID.

The SAID is replaced with filler ‘#’ characters up to the same length of the SAID. The digest is calculated, encoded and compared with the SAID

How does the generation step work? This question kicks off a larger discussion about CESR-style encoding of cryptographic digests using pre-padding and type codes. First, let’s start with some code examples that cut right to the chase. You can come back to these examples after reading the post if they don’t make sense to you at first.

Code examples with multiple algorithms

Let’s start with some code examples showing how to create a correct SAID including the appropriate pre-padding characters. For additional understanding come back and review these examples after you have read the sections on 24 bit boundaries, pad characters, and pad bytes.

For now, say you want to use other cryptographic digest algorithms to create your SAIDs. How would you go about doing that?

It is as easy as changing your hashing function and then using the corresponding type code from the CESR Master Code Table corresponding to your desired digest algorithm.

The following code examples in Python illustrate the process for each of the following algorithms, Blake2b-256, Blake3-256, and SHA2-256. The SHA3-256 algorithm is shown above in the example in the main body of the article.

Filler ‘#’ characters in digest ‘d’ field

The following examples all use the raw value that includes the filler ‘#’ pound sign characters for the digest field ‘d’ which will both be explained later. The “d” digest field is supposed to contain the same number of filler characters as the eventual SAID that will replace the filler characters.

Creating a Blake2b-256 SAID – Step By Step

For a Blake2b-256 SAID with Python you just change the hash function and specify a digest size.

import hashlib from base64 import urlsafe_b64encode raw_value = b'{"d":"############################################","first":"john","last":"doe"}' digest = hashlib.blake2b(raw_value, digest_size=32).digest() # <-- See the different algorithm blake2b padded_digest = b'\x00' + digest encoded = urlsafe_b64encode(padded_digest) b64_str_list = list(encoded.decode()) # convert bytes to string of chars for easy replacement of 'A' b64_str_list[0] = 'F' # replace first 'A' character with 'F' type code b64_str = ''.join(b64_str_list) # convert string of chars to string with .join() assert b64_str == 'FFfZ4GYhyBRBEP3oTgim3AAfJS0nPcqEGNOGAiAZgW4Q' assert len(b64_str) == 44 # length should still be 44 characters, 264 base64 bits, a multiple of 24 bits Creating a Blake3-256 SAID – Step By Step

Blake3-256 is even easier, though it requires the blake library

import blake3 from base64 import urlsafe_b64encode raw_value = b'{"d":"############################################","first":"john","last":"doe"}' digest = blake3.blake3(raw_value).digest() # <-- See the different algorithm blake3.blake3 padded_digest = b'\x00' + digest encoded = urlsafe_b64encode(padded_digest) b64_str_list = list(encoded.decode()) # convert bytes to string of chars for easy replacement of 'A' b64_str_list[0] = 'E' # replace first 'A' character with 'E' type code b64_str = ''.join(b64_str_list) # convert string of chars to string with .join() assert b64_str == 'EKITsBR9udlRGaSGKq87k8bgDozGWElqEOFiXFjHJi8Y' assert len(b64_str) = 44 # length should still be 44 characters, 264 base64 bits, a multiple of 24 bits Creating a SHA2-256 SAID – Step By Step

And finally SHA2-256 is also easy, just changing the hash function used:

import hashlib from base64 import urlsafe_b64encode raw_value = b'{"d":"############################################","first":"john","last":"doe"}' digest = hashlib.sha256(raw_value).digest() # <-- See the different algorithm sha3_256 padded_digest = b'\x00' + digest encoded = urlsafe_b64encode(padded_digest) b64_str_list = list(encoded.decode()) # convert bytes to string of chars for easy replacement of 'A' b64_str_list[0] = 'I' # replace first 'A' character with 'I' type code b64_str = ''.join(b64_str_list) # convert string of chars to string with .join() assert b64_str == 'IDuyELkLPw5raKP32c7XPA7JCp0OOg8kvfXUewhZG3fd' assert len(b64_str) == 44 # length should still be 44 characters, 264 base64 bits, a multiple of 24 bits

Now on to a visual introduction.

Visual Introduction to SAID

Here is a SAID using the SHA3-256 algorithm on the sample JSON object used in this post.

HPJbVi6fZvGNCASDiwABn2wpQ0lI-2cR0yaoRErkD-j6

Adding this SAID to a document looks like taking the following JSON,

computing the SAID, encoding it, and placing it in the SAID field, or digest field, which is the “d” field in this example:

The ‘H’ character is highlighted here to draw attention to the fact that is a special character. This special character is the type code in the CESR Master Code Table. This indicates the type of cryptographic algorithm being used, SHA3-256 in this case.

I see a problem…

Those new to calculating and encoding SAIDs often encounter a problem here. If you take the raw Base64 encoded value of the JSON value {"d":"","first":"john","last":"doe"} then you end up with the string value eyJkIjoiIiwiZmlyc3QiOiJqb2huIiwibGFzdCI6ImRvZSJ9, which is nowhere close to the value shown in the picture of HPJbVi6fZvGNCASDiwABn2wpQ0lI-2cR0yaoRErkD-j6. Why are they different?

Doing a plain Base64 encoding of the JSON bytes misses an important step, the encoding step referred to above. The rest of the post dives deep into this encoding as it shows and explains how to construct a correct, CESR-encoded, SAID digest and explains the rationale behind why CESR encoding is designed the way it is.

Five parts of a SAID (SAID)?

As mentioned earlier, a SAID is a cryptographic digest. Specifically, it is a kind of digest usable as a content addressable identifier, and it is embedded in the content it identifies. SAIDs were invented by Dr. Samuel Smith as a part of his work on key event receipt infrastructure (KERI), authentic chained data containers (ACDC), and composable event streaming representation (CESR).

To understand how SAIDs work you must learn the interplay of five different concepts including:

Bit boundaries – aligning on 24 bit boundaries using pre-padded bytes on the left/front of raw bytes Hash values – hashing input bytes with hashing functions to produce output hash values (digests) Encoding with the URL-safe variant of Base64 encoding, Using type codes to indicate type of hashing function and size of digest, The two-pass SAID calculation and embedding process.

This article specifically covers SAIDs that are encoded in the CESR format. These CESR-style SAIDs

use pre-padding of pad bytes for bit padding to align on 24 bit boundaries, are compatible with a variety of common hashing functions, are encoded in the URL-safe variant of Base64 encoding (a.k.a. Base64URL), substitute type codes from the CESR Master code table (section 12.4.2) for ‘A’ front zero characters and are calculated from and embedded in the data they identify. How does it work? How are SAIDs calculated?

The easiest way to understand a self addressing identifier is to create one. Starting with the JSON from above we walk through each of the five major concepts required to create a CESR encoded SAID.

7 Steps to Calculate and Embed a SAID

Briefly, the process is listed here. A detailed explanation and example follows this set of steps.

Get an object to calculate a SAID for with a digest field that will hold the SAID. In this case we use the JSON object below and the “d” field will hold the SAID. The field does not have to be empty though it can be. Prior to digest calculation it will be cleared and filled with the correct number of filler characters. Calculate the quantity of Base64 characters the final encoded bytes will take up and fill the digest field with that many ‘#’ characters. This value may be looked up from a parse table like the CESR Master Code Table based on the type of hashing function used. Replace the contents of the digest field, “d” in our case, with pound sign (“#”) characters up to the number of filler characters calculated in step 2. The calculated size and pad values used for this step are reused in step 4. Calculate a digest of the object with the filler ‘#’ characters added using the hash function selected. This will result in a quantity of digest bytes, specifically 32 bytes for the SHA3-256 algorithm. Calculate the quantity of pad bytes that when added to the digest bytes will give you a value length that is multiple of 24 bits. This math is shown below. For us this is 1 pad character giving us 33 bytes. This value may be looked up from a parse table like the CESR Master Code Table. Perform pre-padding by prepending the pad byte to the digest bytes to get padded raw bytes. Encode the padded raw bytes with the Base64 URL Safe alphabet. Pre-padding causes some characters at the start of the digest to be encoded as “A” characters which represent zero in the Base64 URL Safe alphabet. Substitute the type code for the correct number of “A” zero character(s) in the Base64 encoded string according to the CESR encoding rules from the CESR Master Code Table. Use the type code corresponding to the cryptographic hash algorithm used. In our case this is “H” because we are using the SHA3-256 algorithm. This is your SAID! Place the Base64 encoded, type code substituted string (your SAID!) into the digest field in your object. This makes your object self-addressing. 3 Steps to Verify a SAID Start with a SAID from an object you already have. Calculate the SAID for the object using the process shown above Compare the SAID you pulled out of the object with the SAID you calculated. If they match then the SAID verifies. Otherwise the SAID does not verify.

An illustration will make clear why and how this process is done. Let’s walk through an example with a small JSON object. The concept applies to any size JSON object and objects of any serialization format such as CBOR, MessagePack, arbitrary text, or otherwise.

Example walkthrough with JSON and SHA3-256 Create Step 1: Get an object with some data and a digest field

Starting with the JSON below we have a “d” field, or digest field, in which the SAID will eventually be placed. In our case it is empty though it could start with the SAID in the “d” field and the process would still work.

JSON being SAIDified: { "d": "", "first": "john", "last": "doe" } Create Step 2: Calculate the quantity of filler ‘#’ characters

The expected final size of the SAID must be known in advance in order to create a JSON object with a stable size. Calculating this quantity requires that you understand a major concept in CESR:

How to calculate pad sizes (quantity of pad bytes) and full sizes of values.

Understanding this calculation will get you most of the way towards understanding another major CESR concept called “fully qualified Base64 representation” of a cryptographic primitive. A digest is a kind of cryptographic primitive.

Knowing the size in advance, and having it be stable, is critical for CESR’s type, length, value (TLV) encoding scheme. This stable size is achieved by filling the digest field with the same number of pound sign ‘#’ characters as the size of the SAID, which looks like this:

Correct number of filler characters added to digest field { "d": "############################################", "first": "john", "last": "doe" }

This enables the JSON to have the same size during and after the SAID calculation process, giving a stable size. In order to know the number of filler characters then you must calculate how many Base64 characters will be in the final SAID. Calculating how many Base64 characters are needed involves summing raw bytes and pad bytes needed to align on what is called a 24 bit boundary.

Final output has same size since Base64 characters count equals filler length

Aligning on this 24 bit boundary allows the final result with the SAID to have the same length as the version with the filler characters, 44 characters in our case:

{ "d": "HPJbVi6fZvGNCASDiwABn2wpQ0lI-2cR0yaoRErkD-j6", "first": "john", "last": "doe" }

Remember when the “encoding” step was mentioned from above? That’s where this filler character and size calculation knowledge comes in. In this encoding step you learn about the CESR-style encoding using pre-padding, pre-conversion. Knowing how many filler characters to use requires understanding the concept of aligning on a 24 bit boundary. Aligning on a 24 bit boundary is where the pre-padding of CESR comes in. This calculation of pad bytes required to align on a 24 bit boundary is the primary difference between raw, or “naive”, Base64 encoding and CESR encoding.

First let’s delve into what a 24 bit boundary is, why it matters to Base64 encoded values, and then look at some diagrams that make Base64 post-padding and CESR pre-padding clear. In doing this we jump ahead a bit and show byte diagrams of the actual encoded digest since that will help introduce later steps.

24 bit boundary – from Base64

The 24 bit boundary comes from the Base64 encoding format standard, RFC4648, specifically section 4. The reason a 24 bit boundary matters is because you can only use whole Base64 characters; there is no such thing as a fractional Base64 character. A Base64 character represents 6 bits of your raw bytes. A single byte is 8 bits. How do you reconcile the 6 bit Base64 character encoding to the 8 bits of your raw bytes? This is where a little math comes in, specifically the least common multiple.

Section 4 of the Base64 RFC 4648 describes the 24-bit groups that are the origin of the 24-bit boundary:

The encoding process represents 24-bit groups of input bits as output strings of 4 encoded characters. Proceeding from left to right, a 24-bit input group is formed by concatenating 3 8-bit input groups. These 24 bits are then treated as 4 concatenated 6-bit groups, each
of which is translated into a single character in the base 64 alphabet.

RFC 4648 The Base16, Base32, and Base64 Data Encodings – Section 4

Using these 24-bit groups ensures the value coming out of a Base64 decoder is the same value you put in. Separating raw bits into these 24 bit groups is where the phrase “aligning on 24 bit boundaries” comes from.

Splitting the 8-bit groups up into 6-bit groups requires a little math because 8 does not split evenly into 6. The math equation to do this is the least common multiple (LCM). LCM is used to determine the lowest number that both 8 and 6 divide into evenly, which is 24, thus the need for 24-bit groups, or 24-bit boundaries. Any value that is encoded into Base64 characters must be padded to reach a multiple of 24 bits. These 24-bit groupings allows you to cleanly convert all of your 8-bit bytes in to 6-bit Base64 characters and back to bytes without missing any bits.

Yet, if we have a stream that does not align on a 24 bit boundary then how do we create that alignment?

Pad characters on the END of a string are the answer to this in Base64.

By adding the correct number of pad characters on the end of a Base64 stream then you always end up with a value aligned on a 24 bit boundary. The ‘=’ equals sign pad characters in a plain Base64 encoding indicate the quantity of pad bits that were used in the final Base64 character adjacent to the ‘=’ pad characters.

Pad bytes at the START of the raw bytes are the answer to this in CESR.

By prepending the correct number of pad bytes on the start of a set of raw digest bytes then you always end up with a value aligned on a 24 bit boundary. Since the pad bytes are all zero bits then the resulting encoded value will start with one or more ‘A’ characters since they correspond to all zero bits in the Base64 alphabet.

Pad characters Calculation

In a plain Base64 encoding when encoding an array of bytes into Base64 that does not align on a 24 bit boundary the correct number of Base64 pad characters ‘=’ must be included. Why? Because in order to avoid data corruption in the decoded value you must know the precise original value, which means knowing how many pad characters to strip off and how many pad bits to strip out of the Base64 character adjacent to the padding. The decoder of your Base64 character needs to know how many bits of the last character used were just padding and how many were a part of your raw value.

You must signal the end of your raw bytes somehow. If, instead, you ignore, drop, or omit pad characters then you will confuse a Base64 decoder into thinking that pad bits were a part of your raw bytes, which you want to avoid because that will give you a different output value than what your input value was, meaning you would experience data corruption.

Pad characters must be included with a plain or “naïve” Base64 encoded value so that a Base64 decoder can strip the correct number of pad bits from the output giving you your original input bytes when decoding from Base64 characters to raw bytes. This is the purpose that Base64 pad characters serve. The pad characters indicate how many pad byes were used to encode a value in Base64.

CESR uses pad bytes and characters in a similar way, yet on the front, and with pre-conversion padding, so the rules for identifying and stripping pad bits are slightly different.

Yet, let’s stick with Base64 padding for now and come back to CESR padding later. If you are starting to get confused or lost then skip ahead to the diagrams below and come back to this explanation.

ASIDE – Calculating the quantity of Base64 pad characters based on input byte quantity

For a SHA3-256 digest this count is 44 characters. See the math below for an explanation. This number may also be found in the CESR Master Code Table for the type of algorithm used. Since we measure every raw value in terms of bytes (8 bits) then there are three possible scenarios, detailed here in the Base64 RFC, for the number of pad bytes required and thus pad characters.

A value ending with a single byte (8 bits) beyond a 24 bit boundary requires two bytes (16 bits) to meet a 24 bit boundary. This will have two ‘=’ pad characters. This means that your 8 raw bits + the 16 padding bits (two bytes) will equal 24 bits, aligning your raw value on a 24 bit boundary. A value ending with two bytes (16 bits) beyond a 24 bit boundary requires one byte (8 bits) to align on a 24 bit boundary. This will have one ‘=’ pad character. Take the 16 bits + one pad byte (8 bits) to get to 24 bits to align on the 24 bit boundary. A value ending with three bytes is already aligned on a 24 bit boundary (3 * 8 = 24)

You can use the modulus operator ‘%’ to determine the number of ending bits you have. For 256 bits (32 bytes * 8 bits per byte) you end up with 16 bits, or two bytes, rule number two above. So we need the equivalent of one pad byte.

How Base64 handles pad bits

The way that Base64 handles the need for pad bytes is to split the last byte into two characters, add zero bits to the last Base64 character, and then add the correct number of pad ‘=’ equals sign characters to the final output to end up with groups of 4 Base64 characters, which aligns on a 24 bit boundary because 4 * 6 bits per Base64 character = 24 bits.

What this means for a SAID – Calculating Pre-pad Bytes for CESR

In CESR padding is handled a bit differently because it repurposes the pad characters for type codes in its TLV encoding scheme. This means that what would have been zero bits representing ‘A’ characters in the Base64 encoded CESR value gets replaced with the type code, also called derivation code, in the final CESR value. To accomplish this CESR does pre-padding prior to conversion to Base64 characters. What this means for SAIDs is that all digest bytes must be padded at the front of the digest bytes to reach a multiple of 24 bits. Compare this to Base64 padding which occurs at the end of the digest bytes. Both scenarios are pictured below, Base64 padding and CESR padding.

Since the SHA3-256 digest we start with is 32 bytes, or 256 bits (not a multiple of 24), then all we need to add is one byte to get to 264 bits, which is a multiple of 24, or 33 bytes.

Now once you know the quantity of bytes that align on a 24 bit boundary you can do a simple calculation to get to the number of pad characters for your digest. Since 6 bits of every byte are put into a Base64 character (6 bit groups) then you can divide your total number of bits (264) by 6 to get the number of Base64 characters of your final digest.

264 (bits) / 6 (bits per Base64 char) = 44 (Base64 chars)

This means the total length of the resulting SAID will be 44 Base64 characters. So, you need 44 filler ‘#’ pound sign characters in your digest field of your JSON object prior to calculating the SAID.

Fixed width output – why is it needed?

Consistent sizing of the resulting JSON object for stable size of the overall output is the primary reason for pad characters. In order to create the same size output both before and after the SAID is added into the JSON there must be an equivalently sized number of pound signs (44 in this case) placed into the same field where the SAID will go. This is used in CESR encoding because CESR data types are encoded with to a type, length, and value scheme (TLV scheme) that simplifies parsing. Size of the overall output is the length, or “L,” in TLV and it only works if you have a known width data.

{ "d": "############################################", "first": "john", "last": "doe" }

Now that you know the rules for calculating the number of pad characters then we are ready to illustrate the calculation process with diagrams.

Diagram for plain “naïve” Base64 encoding of SHA3-256 digest

Base64 uses post-padding, post-conversion of pad characters, as shown in the diagram below. You start with the raw digest. All the boxes in this diagram represent the raw bytes of the digest. There is no padding yet because the value is raw and is not yet converted to Base64 characters.

Binary bits of 32 byte SHA3-256 digest of above JSON with ‘#’ filler

For those following along in code the raw bytes of the 32 byte SHA3-256 digest of the JSON above (with the ‘#’ filler characters) are represented in binary as follows:

1111001001011011010101100010111010011111011001101111000110001101000010000000010010000011100010110000000000000001100111110110110000101001010000110100100101001000111110110110011100010001110100110010011010101000010001000100101011100100000011111110100011111010

Take a look at the last two bytes are 11101000 and 11111010. This factors in to the last two characters adjacent to the pad character as you see below.

Encode this 32 byte digest to Base64 URL Safe and you get get:

What happened here is that four bits (1010) of the last byte ( 11111010) were encoded into the last character lowercase ‘o’ adjacent to the pad character. If you look at the value for lowercase o in the Base64 alphabet you will see that lowercase ‘o’ has the bit pattern 101000. Yet it only pulled four bits from the last byte of 11111010 so where did the last two bits (00) come from? They were added in by the Base64 encoder. These two pad bits are why the corresponding final value has a single equals sign ‘=’ pad character. That instructs the Base64 encoder to strip two bits from the last character during the decoding process:

IMPORTANT: Base64 does not add the padding to the raw bytes prior to conversion. Instead it adds the padding while converting the 6 bit groups of the raw bytes into Base64 characters.

Due to the fact that 32 bytes, 256 bits, does not evenly align on a 24 bit boundary, is not a multiple of 24, the Base64 encoder splits the last byte into two different Base64 characters since 8 bits does not evenly fit in one 6 bit group and must be spread across two 6-bit groups. Each of these 6 bit groups each get their own Base64 character. In this case, the last two bytes 11101000 and 11111010 get spread across the last two characters ‘P’ (001111) and ‘o’ (101000).

Because of how the math works when splitting the 8-bit byte groups into 6-bit Base64 character groups the ‘o’ character got four bits from the very end of the digest. Yet four bits is not enough for a Base64 character so the Base64 encoder adds two zero bits on the end, signified with white boxes containing zeroes. Before the pad character is added then we are at 43 Base64 characters (6 bit groups, 258 bits), which is not a multiple of 24 bits. When the pad character ‘=’ is added then we get to a 44 characters (264), which is a multiple of 24 bits, meaning the encoding completed successfully.

Base64 Encoded SHA3-256 Digest

With the fully padded value you end up with a valid, encoded, Base64 value that looks like the following bit diagram:

The C2 character at the end shares some bits with the raw bytes of the digest and also contains some padding zero bits. The last character, C1, is an equals sign ‘=’ pad character. The fact that there is one pad character indicates to the Base64 decoder that there are two zeroed pad bits to remove from the last character, ‘C2’, during decoding in order to get back to the original digest bytes.

‘=’ is wasted space?

You could consider the pad characters ‘=’ as wasted space that could be useful if repurposed. All of the pad bits used for the equals sign could represent something. This is exactly what CESR does except it moves the padding to the front of the bytes so that it can have a uniform TLV encoding format. TLV encoding formats require the type character to be at the front of the value, so using post-padding like Base64 does would not work.

Along these same lines, SAIDs do not use Base64-style padding because it does not enable separability of individual concatenated values due to the fact that there is no easy way to cleanly and reliably separate individual values out of a Base64 encoded stream of bytes. The CESR specification introduction mentions this:

This Composability property enables the round-trip conversion en-masse of concatenated Primitives between the text domain and binary domain while maintaining the separability of individual Primitives.

Composable Event Streaming Representation ToIP specification – Dr. Sam Smith

Now that you understand how the plain or “naïve” Base64 encoding works then we turn our attention to CESR style pre-padding.

CESR Byte Padding: Pre-padding, Pre-conversion

In CESR the padding of values occurs with the raw bytes prior to encoding to Base64 as shown below in the white box containing ‘B33.’

What this means is that the raw value, prior to conversion, already aligns on a 24 bit boundary. Due to this alignment pre-conversion then there will never be any Base64 pad characters ‘=’ in the output.

How many bytes to prepend?

How do you know how many bytes to prepend? With a similar calculation we did above to find the number of filler characters

Since the SHA3-256 digest we start with is 32 bytes, or 256 bits (not a multiple of 24), then all we need to add is one byte to get to 264 bits, which is a multiple of 24, or 33 bytes.

Again, once you know the quantity of bytes that align on a 24 bit boundary you can do a simple calculation to get to the number of pad characters for your digest. Since 6 bits of every byte are put into a Base64 character (6 bit groups) then you can divide your total number of bits (264) by 6 to get the number of Base64 characters of your final digest.

264 (bits) / 6 (bits per Base64 character) = 44 (Base64 Characters)

So 44 will be the quantity of filler characters to put into the JSON object in order to calculate a SAID.

What happens when prepending bytes for CESR style encodings?

When encoding a value that requires padding with CESR-style padding (up front), instead of ‘=’ at the end like Base64-style padding would produce you end up with ‘A’ characters on the front of your encoded value. You also end up with the one character adjacent to the ‘A’ character(s) including some pad bits and some raw bits, as shown below in the bit diagram.

The intermediate encoded value looks like the below value that is not yet a SAID. This is not yet a SAID because the ‘A’ character has not yet been replaced with a type code from the TLV scheme indicating this is a SHA3-256 digest.

This ‘A’ character represents all zero bits (000000) in the Base64 alphabet.

In binary the full, pre-padded digest value (all 33 bytes) looks like the following. Notice the zero bits at the front.

000000001111001001011011010101100010111010011111011001101111000110001101000010000000010010000011100010110000000000000001100111110110110000101001010000110100100101001000111110110110011100010001110100110010011010101000010001000100101011100100000011111110100011111010

The first two bytes are 00000000 and 11110010 which get encoded into Base64 as shown below. Six of the zero pad bits get encoded as an ‘A’ character and two of the zero pad bits get included in the capital ‘P’ character which also has four bits from the next raw byte of data from the digest.

Bit diagram of Base64 encoded, CESR pre-padded raw value.

This diagram illustrates how CESR does pre-padding with pad bytes of zero bits prior to performing a Base64 encoding on the fully padded raw value. The next diagram of characters shows the space a fully padded, encoded, CESR-style value would look like.

As you can see, the padding is at the front of the encoded value rather than the back like Base64 does. And the character with shared pad and raw bits is adjacent to the pad character at the front of the Base64 encoded value.

To get to the final SAID then you replace the ‘A’ character with the appropriate type code, or derivation code, yet we are getting ahead of ourselves a bit too much. Let’s now get into the calculation of the digest.

This step showed you how to calculate the appropriate number of filler ‘#’ pound sign characters to put in to the digest field in your JSON object. The next step shows you how to calculate a digest of that JSON object.

Creation Step 3: Calculate a digest of the data

When calculating a digest then you take the data with the correct number of filler characters added to the digest field and you simply take a digest of it. So in our case we would take a digest of the following

{ "d": "############################################", "first": "john", "last": "doe" }

In Python taking a digest of this data would be as simple as the following:

import hashlib raw_value = b'{"d":"############################################","first":"john","last":"doe"}' digest = hashlib.sha3_256(raw_value).digest() # hash function ↑↑↑↑

This is a simple step and is very similar for any other algorithm such as SHA2-256, Blake3-256 or otherwise. You use the desired type of hash function.

The only other thing to be aware of here is that if you create a digest that is sized differently than 32 bytes, such as a SHA3-512 digest (64 bytes) then you need to also change the number of pad bytes, which gets into the next step.

Creation Step 4: Calculate the quantity of pad bytes

The calculation for the quantity of pad bytes is very similar to the calculation for the quantity of filler ‘#’ characters needed in Step 2. In fact, it is a subset of that calculation. The goal with pad characters is to make sure that the final value aligns on a 24 bit boundary as mentioned above.

For example, since the SHA3-256 digest we start with is 32 bytes, or 256 bits (not a multiple of 24), then all we need to add is one byte to get to 264 bits, which is a multiple of 24, or 33 bytes.

Deeper into Modulus Math for Pad Bytes

To get a bit deeper into the math, one way to do this calculation with the modulus operator is to find out how many characters are necessary to completely fill a 3 byte group. Since 3-byte groups are multiples of 24 then you can use a modulus calculation to see how far away you are from filling a three byte group by doing a modulus 3 operation in two steps:

Step 1: take bytes mod 3

32 bytes mod 3 = 2 (bytes)

meaning there are two bytes already in the last group of three (24 bit boundary).

Step 2: subtract bytes in group from group size

So to see how many bytes you must add to get to the 24 bit boundary (3 byte group) you subtract the quantity of bytes you have from the group size:

3 (group size) – 2 (bytes in group) = 1 (pad bytes needed to fill group)

Due to how modulus arithmetic works you will only ever have three possible values from this equation:

3 – (bytes mod 3) = 0 (pad bytes) 3 – (bytes mod 3) = 1 (pad bytes) 3 – (bytes mod 3) = 2 (pad bytes)

You never have to worry about three pad bytes because that would be an even multiple of 24 which means your raw value would already align on a 24 bit boundary and thus not need any pad bytes.

So, to review, for us the calculation of (3 - (32 mod 3)) = 1 pad byte gives us a single pad byte to be prepended to our raw value, as shown below in the ‘B33’ box.

As mentioned before, CESR does pre-padding, pre-conversion which means that the pad byte we found we need is added to the front of the array of raw bytes for the SHA3-256 digest. The next step covers encoding this padded raw value.

Creation Step 5: Base64 URL Safe Encode the padded raw bytes

Now that the raw value from Step 4 is properly padded then you encode it with Base64 URL Safe encoding. CESR uses Base64 URL Safe encoding rather than plain Base64 encoding so that CESR values can safely be used in URLs and filenames.

import hashlib from base64 import urlsafe_b64encode raw_value = b'{"d": "############################################", "first": "john", "last": "doe"}' digest = hashlib.sha3_256(raw_value).digest() padded_digest = b'\x00' + digest encoded = urlsafe_b64encode(padded_digest) # encode to base64 ↑↑↑↑ assert encoded == b'APJbVi6fZvGNCASDiwABn2wpQ0lI-2cR0yaoRErkD-j6' assert len(encoded) == 44

Now that you have the Base64 URL Safe encoded value then you are ready to finish off this SAID creation by replacing the ‘A’ pad character at the front of the encoded value with the appropriate value from the CESR Master Code Table.

Creation Step 6: Substitute Type Code for the front ‘A’ character(s)

When CESR pre-padded the raw value to get to a 24 bit boundary the purpose of that was to be able to repurpose the wasted space of the pad character for a type code in CESR’s TLV encoding scheme. The ‘A’ character at the front of the value in this scheme is considered to be a pad character. This pad ‘A’ character will be replaced with the appropriate type code, or derivation code in CESR parlance, from the CESR Master Code Table.

For a SHA3-256 digest that type code is ‘H’ as seen in the following subset of the CESR Master Code Table.

The substitution gives us a final value of HPJbVi6fZvGNCASDiwABn2wpQ0lI-2cR0yaoRErkD-j6 as seen in the following substitution diagram.

The substitution of the ‘A’ character with the ‘H’ character is the final part of what is called CESR encoding a raw digest value into a CESR-style self addressing identifier. This SAID is a front-padded, Base64 encoded, and type-code substituted, string of Base64 characters.

The final value can be created by the code as follows:

import hashlib from base64 import urlsafe_b64encode raw_value = b'{"d":"############################################","first":"john","last":"doe"}' digest = hashlib.sha3_256(raw_value).digest() padded_digest = b'\x00' + digest encoded = urlsafe_b64encode(padded_digest) b64_str_list = list(encoded.decode()) # convert bytes to string of chars for easy replacement of 'A' b64_str_list[0] = 'H' # replace first 'A' character with 'H' type code b64_str = ''.join(b64_str_list) # convert string of chars to string with .join() assert b64_str == 'HPJbVi6fZvGNCASDiwABn2wpQ0lI-2cR0yaoRErkD-j6' assert len(b64_str) == 44 # length should still be 44 characters, 264 base64 bits, a multiple of 24 bits Creation Step 7: Place the Front-Padded, Base64 encoded, Type-code Substituted string in the digest field

Now we can take this correctly padded, CESR encoded value and place it into the digest field in our JSON object, replacing the filler ‘#’ characters with the final, valid SAID:

{ "d": "HPJbVi6fZvGNCASDiwABn2wpQ0lI-2cR0yaoRErkD-j6", "first": "john", "last": "doe" }

This takes us back to where we started off, with a valid SAID and a SAIDified JSON object.

What about verification?

What is nice about verification is that it is as simple as calculating the SAID again of a JSON object and comparing that to a SAID you are handed.

Verification Step 1: Start with a SAID from the object you already have

Say you are starting with the below object that has already had a SAID calculated and embedded in the digest field, the “d” field here.

{ "d": "HPJbVi6fZvGNCASDiwABn2wpQ0lI-2cR0yaoRErkD-j6", "first": "john", "last": "doe" }

To get the SAID from this object you extract the value of the “d” field, giving you HPJbVi6fZvGNCASDiwABn2wpQ0lI-2cR0yaoRErkD-j6

Verification Step 2: Calculate the SAID of the object using the SAID creation steps

Verification is easy because all you need to do is take steps 1 through 6 above and re-calculate the SAID on the JSON object provided. Once you have recalculated the SAID, which will be HPJbVi6fZvGNCASDiwABn2wpQ0lI-2cR0yaoRErkD-j6 again, you can perform the comparison in step 3.

Verification Step 3: Compare the SAID from the object to the calculated SAID

If the SAID the object started with matches the SAID you calculated from the object then you know the object has not been changed and that the SAID is valid. Otherwise either your SAID is invalid or the object has changed.

Review Calculating a SAID

Now you understand how we SAIDify a JSON object by doing the following seven step process:

Start with a JSON object we want to add a SAID to that has a digest field. Calculate the quantity of Base64 characters the final, pre-padded, encoded raw digest bytes (SAID) will take up and fill the digest field with that many ‘#’ characters. Calculate a digest of the bytes of the JSON object after the ‘#’ filler characters are added. Calculate the quantity of pad bytes needed to align on a 24 bit boundary and prepend that to the raw bytes for a digest. Encode the padded raw bytes with the Base64URLSafe alphabet. Substitute the appropriate type code in place of the ‘A’ character(s) at the front of the encoded string. This final value is your SAID Place the final SAID value into the digest field of your JSON object.

Pre-padding and type code substitution prior to Base64 encoding is the essence of CESR-style self addressing identifiers. The steps above may seem overwhelming at first, though once you mentally anchor in that CESR pads at the start and that padding gives you ‘A’ characters you can reuse for type codes then you have mastered the fundamentals of what makes CESR style SAIDs work.

Verifying a SAID

Verification of a SAID is easy because you just calculate it again from the original JSON object, or other data object you are using. If the SAIDs match then it verifies; if they don’t then the data changed.

Extra Learning Alert – fully qualified Base64 primitive

And, as a nice side note, you happen to now know what the phrase “fully qualified base64 primitives” in KERIpy means. All that means is that your encoded value has been pre-padded, pre-conversion, and has had its type code added to the front, as we did here with substitution, with the exception that some CESR primitives

Give me a library please! I don’t want to manage these details

In case this article has convinced you that you do not ever again want to worry about the vagaries of aligning on 24 bit boundaries for Base64 or CESR values then you are in luck. There are multiple implementations of the SAID process that can meet your needs in a variety of different languages.

The Python reference implementation in Web Of Trust’s KERIpy’s Saider.saidify. The Human Colossus Foundation’s Rust implementation with WASM bindings for their JavaScript package. See their cool SAID generator and verifier demo here where you can try a whole list of different algorithms. SAIDify, my own Typescript implementation of the SAID creation process. Implementations Web Of Trust KERIpy Python

The Python example below from KERIpy shows a unit test showing the usage of the KERIpy Saider.saidify library code to calculate a SAID. The SAID is stored in the .qb64 property of Saider. The term qb64 stands for “qualified base64” which means a left-padded, Base64 encoded, type code substituted value as described above.

import json from keri.core.coring import MtrDex, Saider def test_saidify_john_doe(): code = MtrDex.SHA3_256 ser0 = b'{"d": "", "first": "john", "last": "doe"}' sad0 = json.loads(ser0) saider, sad = Saider.saidify(sad=sad0, code=code) assert saider.qb64 == 'HPJbVi6fZvGNCASDiwABn2wpQ0lI-2cR0yaoRErkD-j6' Human Colossus Foundation Rust SAID demo and test code

Start with their cool demo site of generating and verifying SAIDs:

If you want to dive into their code the linked test basic_derive_test shows the Rust code for the cool SAD macro #[derive(SAD, Serialize)]that can turn any Rust struct along with the #[said] field attribute for the SAID digest field into a self-verifying data structure.

use said::derivation::HashFunctionCode; use said::sad::SAD; use said::version::format::SerializationFormats; use said::SelfAddressingIdentifier; use serde::Serialize; #[test] pub fn basic_derive_test() { #[derive(SAD, Serialize)] struct Something { pub text: String, #[said] pub d: Option<SelfAddressingIdentifier>, } let mut something = Something { text: "Hello world".to_string(), d: None, }; let code = HashFunctionCode::Blake3_256; let format = SerializationFormats::JSON; something.compute_digest(&code, &format); let computed_digest = something.d.as_ref(); let derivation_data = something.derivation_data(&code, &format); assert_eq!( format!( r#"{{"text":"Hello world","d":"{}"}}"#, "############################################" ), String::from_utf8(derivation_data.clone()).unwrap() ); assert_eq!( computed_digest, Some( &"EF-7wdNGXqgO4aoVxRpdWELCx_MkMMjx7aKg9sqzjKwI" .parse() .unwrap() ) ); assert!(something .d .as_ref() .unwrap() .verify_binding(&something.derivation_data(&code, &format))); } SAIDify

If you want to use a Typescript library that is about 530 lines of code you can go with my SAIDify library. The below example shows how to use the library with Typescript.

Start with an NPM install

npm install saidify

And then you can use the saidify(data, label) function to SAIDify any JavaScript object you have as long as you indicate which field is the digest field, the “label” field, which defaults to the “d” field.

import { saidify, verify } from 'saidify' // create data to become self-addressing const myData = { a: 1, b: 2, d: '', } const label = 'd' const [said, sad] = saidify(myData, label) // said is self-addressing identifier // sad is self-addressing data console.log(said) // ...Vitest test assertion expect(said).toEqual('ELLbizIr2FJLHexNkiLZpsTWfhwUmZUicuhmoZ9049Hz') // verify self addressing identifier const computedSAID = 'ELLbizIr2FJLHexNkiLZpsTWfhwUmZUicuhmoZ9049Hz' const doesVerify = verify(sad, computedSAID, label) // can verify with original myData or sad // ...Vitest test assertion expect(doesVerify).toEqual(true) Conclusion

The key takeaways from calculating SAIDs are:

Use pre-padded bytes to align on a 24 bit boundary prior to encoding as Base64 characters. Substitute type codes in for the leading ‘A’ character(s) of a SAID. It is easy to chose different algorithms for the SAID calculation process. Just make sure you use a code on the CESR Master Code Table if you want to be CESR compliant. There are multiple implementations of the SAID algorithm you can use.

Now go make some SAIDs!

References: HCF oca-spec #58 RFC 4648: The Base16, Base32, and Base64 Data Encodings, specifically section 5 Composable Event Streaming Representation (CESR) ToIP Specification, specifically section 12.6 Self Addressing Identifier IETF draft specification SADs, SAIDs, and ACDCs video presentation by Daniel Hardman

Michael Ruminer

Anthropic’s Contextual Retrieval

One of the problems I have encountered in trying to do Retrieval Augmented Generation (RAG) where a complete single document was not uploaded for queries on that document alone is that chunks created from documents for embedding often lose context. I have often considered what would be the outcome if one were to put some limited but useful context along with each chunk. Now I know. Introducing Ant

One of the problems I have encountered in trying to do Retrieval Augmented Generation (RAG) where a complete single document was not uploaded for queries on that document alone is that chunks created from documents for embedding often lose context. I have often considered what would be the outcome if one were to put some limited but useful context along with each chunk. Now I know. Introducing Anthropic's contextual retrieval concept.

One of my concerns with placing some additional context for each chunk is that you’d probably need to pass the entire document being chunked as context along with each individual chunk. That would be very expensive and slow. Now, with Anthropic’s Claude prompt caching, the cost and latency is reduced significantly and it seems actually doable for chunk contexting (is contexting a word?). An initial prompt including the full document could be done with that prompt cached for future prompt reference.

I plan to try this out.

Check out the Anthropic “Introducing Contextual Retrieval” post for greater details.

Saturday, 21. September 2024

Jon Udell

deo absente deum culpa

On a recent trip I saw this pair of Latin phrases tattooed on the back of a flight attendant’s arms: Left: Deo absente. Right: Deum culpa. I took Latin in middle school, and could guess what the combination might mean. It’s not a common construction, and a search seems to confirm my guess. Both Google … Continue reading deo absente deum culpa

On a recent trip I saw this pair of Latin phrases tattooed on the back of a flight attendant’s arms:

Left: Deo absente. Right: Deum culpa.

I took Latin in middle school, and could guess what the combination might mean. It’s not a common construction, and a search seems to confirm my guess. Both Google and Bing take you to a couple of Reddit posts in r/Latin.

Would this be the correct translation?

A song I like, Deus in absentia by Ghost, has that line in it intending to mean “In the absence of God”, so I was looking into alternate translations/syntax of the phrase intending to mean “In the absence of God; Blame/Fault God”. Would this make sense: “Deum in absente; Culpa Deus” or “Deus Culpa”?

Does the phrase “Deus In Absentia, Deus Culpa” make sense?

I’m using this for a tattoo and want to be absolutely sure it works in the sense of ‘In the absence of God, blame God’. All help appreciated!

Is that the same person I saw? If so, the responses in r/Latin seem to have guided them to the final text inked on their arms. And if so, the message is essentially what I had guessed. The intent of the message, though, is open to interpretation. I’m not quite sure how to take it. What do you think it means? Would it have been rude to ask?

Friday, 20. September 2024

Jon Udell

SQL translation from Postgres to SQLite and DuckDB

Powerpipe dashboards can now connect not only to Steampipe but also to SQLite and DuckDB. This creates a combinatorial explosion of possibilities, including dashboards that use SQL to visualize large datasets read from Parquet files by DuckDB. SQL Translation From Postgres to SQLite and DuckDB Part of the LLM series at The New Stack.

Powerpipe dashboards can now connect not only to Steampipe but also to SQLite and DuckDB. This creates a combinatorial explosion of possibilities, including dashboards that use SQL to visualize large datasets read from Parquet files by DuckDB.

SQL Translation From Postgres to SQLite and DuckDB

Part of the LLM series at The New Stack.

Thursday, 19. September 2024

Michael Ruminer

A Great AI RAG Resource

I came across a great AI Retrieval Augmented Generation resource. It is a Github repo: Advanced RAG Techniques: Elevating Your Retrieval-Augmented Generation Systems.I’ll just copy and paste their introduction here. “Welcome to one of the most comprehensive and dynamic collections of Retrieval-Augmented Generation (RAG) tutorials available today. This repository serves as a hub for cutting-edge t

I came across a great AI Retrieval Augmented Generation resource.
It is a Github repo: Advanced RAG Techniques: Elevating Your Retrieval-Augmented Generation Systems.I’ll just copy and paste their introduction here.

“Welcome to one of the most comprehensive and dynamic collections of Retrieval-Augmented Generation (RAG) tutorials available today. This repository serves as a hub for cutting-edge techniques aimed at enhancing the accuracy, efficiency, and contextual richness of RAG systems.”

All I can say is, wow. It really covers a lot of ground. I plan to dig into it and will report back.

Monday, 16. September 2024

Heather Vescent

Ask A [cybersecurity] Futurist

Welcome to the inaugural “Ask a [cybersecurity] futurist column. This month’s question comes from Anna Borg. She asks: How can we change the “rules in the game” and make it less interesting to use the internet for attacks on organisations and societies including democracy. We will never “win the game” if we are forced to play defence play 24/7, 365 days per year. How do we change the incent

Welcome to the inaugural “Ask a [cybersecurity] futurist column. This month’s question comes from Anna Borg.

She asks:
How can we change the “rules in the game” and make it less interesting to use the internet for attacks on organisations and societies including democracy. We will never “win the game” if we are forced to play defence play 24/7, 365 days per year. How do we change the incentives for attackers?

I won’t sugar coat it, this is a complex and interconnected problem. Our increased cyber attacks are a symptom of a much larger problem. (Which means the solutions will not just address the cybersecurity problem at the core of this question, but bigger problems that society deals with.)

The Complexity of Cybersecurity Threats

For defense teams, the current state of cybersecurity feels like an endless battle, with organizations and societies constantly reacting to the latest threats. For hackers, it’s a landscape with abundant vulnerabilities if one is willing to put in the work to find (or create) the magic key — like a recent Advanced Persistent Teenager.

Cybersecurity is a long imbalanced game of whack-a-mole. Defenders have to guard everything, while attackers have the luxury of focusing. So how can you move out of a reactive mode to stay one step ahead of the next attack? And further, how can we make it even pointless for attackers to target us in the first place?

Understanding Hacker Motivation

To change the rules, we need to understand why cyber attacks are happening. What drives someone to become a black hat hacker? Or to start a black hat collective? Or join an elite nation state hacking team? I’ve spoken with security professionals, hackers and analyzed research data, and the motivations come down to: money, espionage, and the thrill of the con.

Viktor and Giselle from Better Call Saul

In a survey I conducted last year with identity-security experts, financial gain and espionage were the top motivations, followed by a desire for notoriety or the thrill of hacking. Most hackers today aren’t lone wolves; they’re part of organized collectives that pool their skills and resources to hit bigger, complex targets. They might even employ the Silicon Valley approved “AAS/as a service” business model.

There’s another factor that’s often overlooked: socioeconomic conditions. During my research for the paper Shifting Paradigms: Using Strategic Foresight to Plan for Security Evolution, I was genuinely surprised to hear about the overproduction of well-educated young people unable to find good jobs after their security education. There are very few well paid entry level security jobs — even in the US and developed nations.

Changing the Incentives

So how do we change the incentives to make cyber-attacks less appealing and less profitable for would-be attackers?

I’m going to skip over the obvious answer of creating/using better security technology. Everyone is racing to implement better tech solutions, but this is just a band aid — admittedly a really big band aid. I’m going to talk about non-tech solutions.

Economic Solutions

If it’s true that we are overproducing highly educated security people, could we solve some security problems by having full employment for all educated security experts?

One scenario in Shifting Paradigms envisioned this idea.

5.10 Scenario 9: Money, Cash, Hoes
In this scenario, whose name is inspired by the Jay Z song, security defensive work has become so attractive and well-paid that black-hat work is no longer attractive, and people with security-relevant technical skills are almost universally employed in defensive jobs. Governments can still hire offensive hackers, but criminal syndicates have a very hard time competing with white-hat employers.
Changes from Today: Employers pay great salaries, offer significant flexibility and benefits, and recruit extremely aggressively, especially in poor countries with good technical talent and in ungoverned spaces. There are many good entry level security jobs.

Could one of the most effective ways to reduce cybercrime to increase legitimate employment opportunities for those with technical skills? If well-paying jobs in cybersecurity or other tech fields were more readily available, especially in economically disadvantaged regions, would the incentive to engage in illegal activities diminish?

I think this scenario is illustrated in Wired’s recent article about Priscila Barbosa, Queen of the Rideshare Mafia, which describes the elaborate identity theft and fraud Priscila engaged in. Barbosa took IT classes back in her native Brazil and was a successful entrepreneur until the recession caused business to disappear. She came to find her fortune in the US — which she did. But because she overstayed her visa, she could not work legitimate jobs. I’d like to imagine, that if there was a legitimate way to work in the US, she would have never participated in criminal activities. And maybe, if there had been good tech jobs in Brazil, she might not have even needed to seek her fortune in the US.

In my view, Barbosa is a victim of economic lack of opportunity.

Disrupting [Criminal] Business Models

What are the business models that make cybercrime profitable? I remember Kim Cameron, Identity Architect, saying (not sure if this is his quote or he was quoting someone), that “the most profitable business model on the internet is crime. Back in the 90s and again now.”

Even entrepreneurial criminals have embraced the successful “as a Service” software development model. But to me, this echoes back to the economic themes. If there were legitimate economic paths half as successful, would there be less interest in criminal activity?

Nation-State Attacks

Nation-state hackers are a whole different beast. Their motivations are tied to geopolitical loyalty and belief. To reduce these types of attacks, we need to focus on improving global cooperation and reducing international tensions. This might sound idealistic, but fostering peace and global coexistence could significantly decrease the incentive for nations to engage in cyber warfare.

Reducing the Thrill

Then there’s the issue of hacking for fun, cultural subversion, or “the thrill of the con.” This is harder to combat, but one approach is to create more avenues for ethical hacking while embracing the dark side of hackers. Bug bounty programs, for example, allow hackers to flex their skills in a legal, constructive way. These programs are subversive in their own right but channel that energy into something positive.

Finding Solutions

Is the solution simply a transparent and interdependent global economy with legitimate well paid jobs so we can all live our best lives without stepping on someone to do so?

I don’t know about you, but that feels like a big and impossible solution — perhaps a bit too utopian. So, I want to leave Anna with some reasonable actions she or her company might take.

Entry level security roles: Increase well paid jobs & support development programs to transition people into well paid jobs. As part of this, eliminate the rockstar mentality, and hire someone who is good enough for the job and then treat them well. Embrace the Lulz factor: Embrace the shadow of cybersecurity by engaging and encouraging the exact hacker mindset but to improve security — and again, offer an economic incentive. Bug bounties are both subversive and helpful.

Both of these ideas could have a corporate initiative with a budget, KPI and program manager. They could be funded out of HR or professional development, IT or risk management line items. Companies could partner with universities, hacker collectives and government agencies, maybe sponsoring a hackathon or participating in a local cyber wargame activity.

Got Questions

If you could see the future of cybersecurity, what would you ask? Submit your question for a chance to be featured in the next “Ask a Cybersecurity Futurist” column.

If you found this insightful please consider booking me for a keynote talk or a lunch and learn session. Or sign up for the Future of Cybersecurity newsletter.


Michael Ruminer

RAG Disappointment and Azure Open AI Success With RAG

Recently, I have been working on programming small modules for Retrieval Augmented Generation (RAG) using Open AI. Also recently, I did a Coursera class on advanced RAG and have taken in several videos and posts on it. I have used samples from those sources and tested using various LLMs to generate simple Python to perform RAG with Open AI. In general , I have been disappointed with the outco

Recently, I have been working on programming small modules for Retrieval Augmented Generation (RAG) using Open AI. Also recently, I did a Coursera class on advanced RAG and have taken in several videos and posts on it. I have used samples from those sources and tested using various LLMs to generate simple Python to perform RAG with Open AI. In general , I have been disappointed with the outcomes until I tried Azure Open AI with Azure AI semantic search.

Disappointment

My general disappointments have come from the self-coded RAG attempts on a single PDF. The basic approach has been:

take a single PDF (not a small PDF — about 43,000 token when uploaded to Open AI API in whole and extract the text using PyPDF chunk the text using at best recursive text and/or sentence transformer; sometimes just naive simple character count split embed the chunks trying Chroma or text-embedding-3-small query the collection using Chroma or FAISS and in one instance also with a simple augmented prompt call the LLM with the initial prompt and context from the embeddings query the prompt was for a list of 10 principles in the document that were all outlined in a single paragraph

It’s nothing sophisticated by any stretch but they corresponded with the examples I had available.

The results — abysmal. I’m not really surprised. I’m not sure how such an approach could do very well with simple chunking, embeddings, and basically running off keyword proximity. But since these were supposed examples of the use of RAG I expected better results. There were only two times I received good results. One of those times I’ll outline below; the other was when I didn’t parse or embed and just passed the entire document as context. Of course, the latter worked well but that was not the exercise I was after.

Success

My successful attempt came when I didn’t create the code directly but used Microsoft Azure Open AI playground. I have no doubt that coding it up would have worked just as well since it relies on the Azure AI infrastructure and the code would be little more than passing a prompt to the Azure LLM instance and getting the results. Here is what it consisted of:

setup of an Azure Open AI instance using gpt-4o-mini model and, I think, the text-embedding-3-small embedding model setup an Azure AI Search instance with semantic search and indexing pointed to a blob storage container with a single PDF in it

I think that was it. I then went to the Azure Open AI Studio playground grounded the model with the search instance, provided my simple prompt, and got back the desired results. Was it the semantic search that made it work well? I suspect it helped a lot. I need to try it without the semantic search and see what happens. Sorry, I forgot to try that scenario.

Recap

All in all, I was very disappointed with the RAG results, as I coded it, especially as they were based on examples or AI generated single document RAG code. But, I was very pleased with the Azure test and I think the semantic search made all the difference.


Damien Bod

Using Elasticsearch with .NET Aspire

This post shows how to use Elasticsearch in .NET Aspire. Elasticsearch is setup to use HTTPS with the dotnet developer certificates and and simple client can be implemented to query the data. Code: https://github.com/damienbod/keycloak-backchannel Setup Two services are setup to run in .NET Aspire. The first service is the official Elasticsearch docker container and deployed … … Continue reading →

This post shows how to use Elasticsearch in .NET Aspire. Elasticsearch is setup to use HTTPS with the dotnet developer certificates and and simple client can be implemented to query the data.

Code: https://github.com/damienbod/keycloak-backchannel

Setup

Two services are setup to run in .NET Aspire. The first service is the official Elasticsearch docker container and deployed using dotnet developer certificates. The second service is an ASP.NET Core application using the Elastic.Clients.Elasticsearch Nuget package. The App.Host project is used to set this up and to link the services together.

Elasticsearch development server

The Elasticsearch container is configured in the program class of the App.Host project. The container is run using HTTPS and takes the Aspire parameters for configuration of the default account.

var elasticsearch = builder.AddElasticsearch("elasticsearch", password: passwordElastic) .WithDataVolume() .RunElasticWithHttpsDevCertificate(port: 9200);

The developer certificates needs to be created and copied to the specific folder inside the Elasticsearch docker container. This is implemented using a shared folder and the Elasticsearch xpack.security.http.ssl properties are set to match. The following three properties are used:

xpack.security.http.ssl.enabled xpack.security.http.ssl.certificate xpack.security.http.ssl.key using System.Diagnostics; using System.IO.Hashing; using System.Text; namespace Aspire.Hosting; // original src: https://github.com/dotnet/aspire-samples/tree/damianedwards/keycloak-sample/samples/Keycloak public static class HostingElasticExtensions { public static IResourceBuilder<ElasticsearchResource> RunElasticWithHttpsDevCertificate(this IResourceBuilder<ElasticsearchResource> builder, int port = 9200, int targetPort = 9200) { if (builder.ApplicationBuilder.ExecutionContext.IsRunMode) { builder .RunElasticWithHttpsDevCertificate() .WithHttpsEndpoint(port: port, targetPort: targetPort) .WithEnvironment("QUARKUS_HTTP_HTTP2", "false"); } return builder; } public static IResourceBuilder<TResource> RunElasticWithHttpsDevCertificate<TResource>(this IResourceBuilder<TResource> builder) where TResource : IResourceWithEnvironment { const string DEV_CERT_DIR = "/usr/share/elasticsearch/config/certificates"; if (builder.ApplicationBuilder.ExecutionContext.IsRunMode) { // Export the ASP.NET Core HTTPS development certificate & private key to PEM files, bind mount them into the container // and configure it to use them via the specified environment variables. var (certPath, _) = ExportElasticDevCertificate(builder.ApplicationBuilder); var bindSource = Path.GetDirectoryName(certPath) ?? throw new UnreachableException(); if (builder.Resource is ContainerResource containerResource) { builder.ApplicationBuilder.CreateResourceBuilder(containerResource) .WithBindMount(bindSource, DEV_CERT_DIR, isReadOnly: false); } builder .WithEnvironment("xpack.security.http.ssl.enabled", "true") .WithEnvironment("xpack.security.http.ssl.certificate", $"{DEV_CERT_DIR}/dev-cert.pem") .WithEnvironment("xpack.security.http.ssl.key", $"{DEV_CERT_DIR}/dev-cert.key"); } return builder; } private static (string, string) ExportElasticDevCertificate(IDistributedApplicationBuilder builder) { var appNameHashBytes = XxHash64.Hash(Encoding.Unicode.GetBytes(builder.Environment.ApplicationName).AsSpan()); var appNameHash = BitConverter.ToString(appNameHashBytes).Replace("-", "").ToLowerInvariant(); var tempDir = Path.Combine(Path.GetTempPath(), $"aspire.{appNameHash}"); var certExportPath = Path.Combine(tempDir, "dev-cert.pem"); var certKeyExportPath = Path.Combine(tempDir, "dev-cert.key"); if (File.Exists(certExportPath) && File.Exists(certKeyExportPath)) { // Certificate already exported, return the path. return (certExportPath, certKeyExportPath); } else if (Directory.Exists(tempDir)) { Directory.Delete(tempDir, recursive: true); } Directory.CreateDirectory(tempDir); var exportProcess = Process.Start("dotnet", $"dev-certs https --export-path \"{certExportPath}\" --format Pem --no-password"); var exited = exportProcess.WaitForExit(TimeSpan.FromSeconds(5)); if (exited && File.Exists(certExportPath) && File.Exists(certKeyExportPath)) { return (certExportPath, certKeyExportPath); } else if (exportProcess.HasExited && exportProcess.ExitCode != 0) { throw new InvalidOperationException($"HTTPS dev certificate export failed with exit code {exportProcess.ExitCode}"); } else if (!exportProcess.HasExited) { exportProcess.Kill(true); throw new InvalidOperationException("HTTPS dev certificate export timed out"); } throw new InvalidOperationException("HTTPS dev certificate export failed for an unknown reason"); } }

When the App.Host project is started, the Elasticsearch containers boot up and the server can be tested using the “_cat” HTTP Get requests or the default base URL will give a server information about Elasticsearch.

https://localhost:9200/_cat

Elasticsearch client

The Elasticsearch client was implemented using the Elastic.Clients.Elasticsearch Nuget package. The client project in .NET Aspire needs to reference the Elasticsearch server using the WithReference method.

builder.AddProject<Projects.ElasticsearchAuditTrail>( "elasticsearchaudittrail") .WithExternalHttpEndpoints() .WithReference(elasticsearch);

Elasticsearch can be queried used a simple query search.

public async Task<IEnumerable<T>> QueryAuditLogs(string filter = "*", AuditTrailPaging auditTrailPaging = null) { var from = 0; var size = 10; EnsureElasticClient(_indexName, _options.Value); await EnsureAlias(); if (auditTrailPaging != null) { from = auditTrailPaging.Skip; size = auditTrailPaging.Size; if (size > 1000) { // max limit 1000 items size = 1000; } } var searchRequest = new SearchRequest<T>(Indices.Parse(_alias)) { Size = size, From = from, Query = new SimpleQueryStringQuery { Query = filter }, Sort = BuildSort() }; var searchResponse = await _elasticsearchClient .SearchAsync<T>(searchRequest); return searchResponse.Documents; }

See the source code: https://github.com/damienbod/keycloak-backchannel/blob/main/AuditTrail/AuditTrailProvider.cs

Notes

With this setup, it is easy to develop using Elasticsearch as a container and no service needs to be implemented on the developer host PC. Setting up HTTPS is a little bit complicated and it would be nice to see this supported better. The development environment should be as close as possible to the deployed versions. HTTPS should be used in development.

Links

https://learn.microsoft.com/en-us/dotnet/aspire/search/elasticsearch-integration

https://www.elastic.co/guide/en/elasticsearch/reference/current/docker.html

https://www.elastic.co/products/elasticsearch

https://github.com/elastic/elasticsearch-net

https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-simple-query-string-query.html

Saturday, 14. September 2024

Mike Jones: self-issued

OAuth 2.0 Protected Resource Metadata draft addressing reviews since IETF Last Call

Aaron Parecki and I published a new version the “OAuth 2.0 Protected Resource Metadata” specification that addresses the review comments received since the IETF Last Call. Per the history entries, the changes were: Added metadata values declaring support for DPoP and mutual-TLS client certificate-bound access tokens. Added missing word caught during IANA review. Addressed ART, […]

Aaron Parecki and I published a new version the “OAuth 2.0 Protected Resource Metadata” specification that addresses the review comments received since the IETF Last Call. Per the history entries, the changes were:

Added metadata values declaring support for DPoP and mutual-TLS client certificate-bound access tokens. Added missing word caught during IANA review. Addressed ART, SecDir, and OpsDir review comments by Arnt Gulbrandsen, David Mandelberg, and Bo Wu, resulting in the following changes: Added step numbers to sequence diagram. Defined meaning of omitting bearer_methods_supported metadata parameter. Added internationalization of human-readable metadata values using the mechanism from [RFC7591]. Added resource_name metadata parameter, paralleling client_name in [RFC7591]. Added Security Considerations section on metadata caching. Used and referenced Resource Identifier definition. Added motivating example of an email client to intro.

The specification is available at:

https://www.ietf.org/archive/id/draft-ietf-oauth-resource-metadata-09.html

Friday, 13. September 2024

Georg C. F. Greve

Coming full circle: How Self-Sovereign Identity Could Lose Its Way

“A picture showing the knight of Self Sovereign Identity losing his way, going into the valley of doom where certain death awaits.” Self-Sovereign Identity (SSI) offers a robust and scale-able solution to the challenges of digital identity that is secure while also respecting privacy, freedom, and human rights. However, a mix of political and industry pressures, along with a push for quick ado
“A picture showing the knight of Self Sovereign Identity losing his way, going into the valley of doom where certain death awaits.”

Self-Sovereign Identity (SSI) offers a robust and scale-able solution to the challenges of digital identity that is secure while also respecting privacy, freedom, and human rights. However, a mix of political and industry pressures, along with a push for quick adoption, risks turning SSI into just another marketing buzzword.

Nowhere is this trend more visible than in the mobile Drivers License (mDL) and the ongoing conversations around the second version of European Unions “electronic IDentification, Authentication and trust Services” (eIDAS).

Old Interests, New Technology

The first version of eIDAS was an adoption failure. While it was not providing privacy by design, technologically it was sufficiently correct to function. It did so by giving a central role to Trust Service Providers (TSPs) and Certificate Authorities (CAs).

These intermediaries sought to exploit their government issued monopoly and consequently choked off adoption for the first version of eIDAS.

In doing so, they repeated the same mistake made on the wider web, where excessive pricing led to stagnation in the adoption of encrypted communications. In 2014, only 10 short years ago, the vast majority of all connections was not using any kind of transport layer security (TLS).

HTTPS adoption became mainstream only between 2015 and 2019 — via Matthew Greene, article linked below
It was the Snowden Revelations that changed everything. But not in the way the intermediaries had hoped.

While I am sure many were rubbing their hands, and expecting the pipe they were sitting on to finally start flowing, everyone in the industry knew that predatory pricing was to blame for the low adoption. So in November 2014, Let’s Encrypt was founded in order to provide for free what Certificate Authorities are selling at premium prices.

Today, Let’s Encrypt provides over 55% of all certificates used on the web. Which provides for two conclusions:

Let’s Encrypt has become the single point of failure for security on the internet. And the addressable market for TSPs has shrunk dramatically, and keeps shrinking.

Simultaneously, the costs for TSPs are growing. Not only in terms of technology cost. But also in terms of compliance cost. Because there is a whole industry of consultants and auditors that service TSPs, extracting six figure payments each year for initial certification and re-certification. Growing cost and shrinking markets are not a great prospect for any industry.

So when the new version of eIDAS came along, promising to switch to Self Sovereign Identity for privacy by design, security, and elimination of intermediaries, the TSPs deployed their lobbyists to Brussels to stop this threat to their remaining business.

The result was a bureaucratic monster, for which some people at DICE 2024 in Zurich voiced doubt whether it could even be implemented in practice.

Intermediaries are once again involved in every step. And all the technological choices are based on legacy Web2.0 technology shaped by the TSPs, without privacy by design, and with well known and regularly exploited security issues.

So TSP lobbying successfully defended their government mandated monopoly over the digital domain, and the fundamental issue that stopped adoption of the first version of eIDAS remains unresolved. Only the future will show whether this second attempt of the TSPs at monetizing all digital interactions will fare any better.

But that isn’t even the biggest problem.

The Web is a bad idea for personal identity
The web is a fundamentally unsound technology choice for personal identity, for a variety of reasons, starting with security.

The early web wasn’t really designed for privacy or security. Its original design was inspired by the need for institutional information sharing. Invented in 1989, security was added in 1995 when Netscape came up with Secure Sockets Layer (SSL), which led the way for today’s Transport Layer Security (TLS). This allowed encryption of communication, but all security was dependent on key management.

Because it was the only practical way at the time, intermediaries became the root of all trust for virtually all users. On top of those issues, secure communication requires two parties who may know nothing about one another to negotiate a secure key to use for encryption with one another. That negotiation must happen over the same channel that is being used for encrypted communication.

Which is a bit like yelling the combination of the key pad to your house through the open window over to your neighbour so he can come in and water the plants. If there are no eavesdroppers at that time: Great. If there are, they now also have the key. You’ll find out whether you were safe in a few weeks when you get back from vacation.

Diffie-Hellman key exchange was invented to solve that particular problem. But it is only secure if used correctly and keys are being rotated and refreshed often. Re-use of static keys or weak parameters can weaken security considerably.

On top of that, cryptography is an ever evolving field, and the need for web servers to support older browsers means that sometimes, outdated or known vulnerable settings are still being allowed. Managing all that is complex. It requires organisational level security teams.

And much like charging your modern Electric Vehicle using a coal powered steam engine to drive a generator, trying to fast track adoption for Self Sovereign Identity by using Web based Decentralized Identifier (DID) methods effectively negates all the conceptual and architectural progress that has been made with SSI.

All the underlying security issues our industry has struggled with for the past 30+ years, all the conceptual weaknesses of the Web, all the privacy issues and compromises also apply to SSI when using did:web and its descendants.

In practical terms this translates into most people once again finding themselves forced to entrust their personal identity to intermediaries in the form of custodial wallets and platforms. Just as they are doing today with Google, Facebook, Amazon, Apple, Microsoft.

Most people will rent usage of their own identities, but they will have no ownership or agency. And those intermediaries will once again have full visibility and control over everything people are doing online.

That is a steep price to pay. Is it worth it at least in terms of security?

Even these large organisations get it wrong often enough for data leaks and other issues to occur frequently. When using the Web for personal identity, these compromises will have consequences beyond anything we have seen so far. What happens when the compromise has enabled the attacker to provide legally valid signatures in your name to sign contracts, or consent to sharing your personal pictures with the wrong parties?

And that’s only the start of the valley of doom.

The Web is based on top of the Domain Name System (DNS). Which is a system in which you rent usage of certain human readable domains on a yearly basis. Without it, the Web falls apart.

So let’s assume enlightened users rent their own domains to become independent from the intermediaries. Now they can move between them in case one of them has issues, goes out of business, or gets acquired by a billionaire with a questionable political agenda. Just move your domain and you’re good, right?

Not so fast.

Not only is it possible you may lose your domain, either due to changes in DNS policy, legal aspects, such as trademarks, or because you’ve missed a yearly payment.

If a personal identity can be lost that easily, was it ever really yours?

But there is an even bigger issue.

The W3C Decentralized Identifier (DID) standard was designed to provide for unique, non-correlatable identifiers for each individual interaction. Which sounds great in theory. But how non-correlatable are did:tdw:johnsmith.eu:223499182 and did:tdw:johnsmith.eu:673403151?

Would you be able to tell they might both belong to the same person, whose name might be John Smith?

Because the domain is always an intrinsic part of each Web based DID method, none of them allow for personal ownership without complete loss of pseudonymity, and non-correlatability.

So whatever pseudonymity and privacy can be achieved leads back full circle to large intermediaries and custodial wallets — just like the federated identity infrastructure of today.

“Anakin Skywalker at a crossroads. Go down the Web based SSI method path, and become Darth Vader, or choose a world of privacy, freedom and security?” TL;DR: Don’t do it, Anakin!
Web technologies are a fundamentally poor choice for personal identity.

They can work in domains where privacy and non-correlatability are not required, for instance organisational identity. But for personal identity, they compromise almost everything that Self Sovereign Identity (SSI) was intended and designed for.

You cannot retrofit privacy and security onto an architecture that wasn’t built for them. Just look at Email. It has the same problem. Decades of adding layers of complexity and auxiliary systems have kept moving the problem somewhere else in the stack, to another component. Lots of snake oil has been sold. Some people got very rich this way. But the fundamental issues remain unsolved.

We should not repeat that same mistake for Self Sovereign Identity.

No system that is intended for personal identity should ever use did:web, did:tdw, or any of the other web based DID specifications. The concepts of SSI can solve all the issues above. They can provide people with real agency and control over their own identity. But not when using the Web.

The solution is simple.

Stop using Web based DID methods for personal identity!
Humans are not organizations. Stop using Web based DID methods for personal identity. did:web and its descendants are very useful for some organizational use cases, and for prototyping new ideas. They have NO place when it comes to personal identity, or use cases that require strong privacy or security.

There are (at least) three superior alternatives that come to mind, all of which can also provide a secondary channel for key negotiation.

Stay tuned.

Thursday, 12. September 2024

Bill Wendels Real Estate Cafe

Leverage $25K downpayment assistance to protect homebuyers & revitalize buyer agency movement

Three decades ago was invited to participate in a collaborative effort to develop President Bill Clinton’s National Homeownership Strategy. Those 100 recommendations raised homeownership from… The post Leverage $25K downpayment assistance to protect homebuyers & revitalize buyer agency movement first appeared on Real Estate Cafe.

Three decades ago was invited to participate in a collaborative effort to develop President Bill Clinton’s National Homeownership Strategy. Those 100 recommendations raised homeownership from…

The post Leverage $25K downpayment assistance to protect homebuyers & revitalize buyer agency movement first appeared on Real Estate Cafe.

Wednesday, 11. September 2024

Patrick Breyer

Neuester EU-Anlauf zur Chatkontrolle: Werden Messenger-Dienste in Europa gesperrt?

Ungarn hat den EU-Innenministern diese Woche einen neuen Vorschlag zur Einführung der Chatkontrolle vorgelegt, der von POLITICO geleakt wurde. Der Einsatz „künstlicher Intelligenz“ zur Suche nach verdächtigen Bildern und Chats …

Ungarn hat den EU-Innenministern diese Woche einen neuen Vorschlag zur Einführung der Chatkontrolle vorgelegt, der von POLITICO geleakt wurde. Der Einsatz „künstlicher Intelligenz“ zur Suche nach verdächtigen Bildern und Chats soll Messenger- und Chatanbietern danach freigestellt bleiben. Zur Suche nach bekanntem illegalem Material bleibt es aber bei dem weltweit einzigartigen Vorhaben, die massenhafte, verdachtslose und nach BKA-Angaben höchst fehleranfällige Kontrolle aller privaten Chats für sämtliche Diensteanbieter verpflichtend zu machen, selbst wenn dazu die bisher sichere Ende-zu-Ende-Verschlüsselung von Messengerdiensten ausgehebelt werden muss. Die EU-Regierungen sollen sich bis zum 23. September zu dem Vorstoß positionieren, die EU-Innenminister sollen ihn am 10. Oktober beschließen. Die Messengeranbieter Signal und Threema haben bereits angekündigt, solche Überwachungsfunktionen keinesfalls in ihre Apps einzubauen und ihre Dienste in der EU gegebenenfalls komplett einzustellen.

“Statt Kindern durch sicherere Apps zu helfen sich im Netz vor Erpressung und Ausbeutung zu schützen, werden Missbrauchsopfer durch einen realitätsfremden und vor Gericht zum Scheitern verurteilten Gesetzentwurf im Regen stehen gelassen,” kritisiert Dr. Patrick Breyer, ehemaliger Europaabgeordneter der Piratenpartei und Mitverhandler der kritischen Position des Europäischen Parlaments. “Unsere Polizei mit großteils irrelevanten Hinweisen auf längst bekanntes Material zu überfluten, stoppt keinen Missbrauch, sondern erschwert seine Aufdeckung zusätzlich. Falls diese Chatkontrolle kommt, werden wir Bürger gängige sichere Messenger nicht mehr nutzen können – das bedeutet wir verlieren den Kontakt zu unseren Freunden und Kollegen in der ganzen Welt. Wollen wir wirklich, dass die EU weltweit führend beim Abhören unserer Smartphones und der flächendeckenden Überwachung der Chats von Millionen gesetzestreuer Bürger wird?”

Breyer ruft dazu auf, Druck auf Bundesinnenministerin Faeser und Bundesjustizminister Buschmann zu machen: “Nur dagegen zu sein und dann überstimmt zu werden, reicht nicht zur Verteidigung unseres digitalen Briefgeheimnisses und sicherer Verschlüsselung. Die Bundesregierung muss jetzt dringend Verbündete suchen, um eine Sperrminorität und eine grundlegende Überarbeitung zu erreichen. Die Sicherheit unserer Kommunikation steht auf dem Spiel!”

Info-Website: https://www.chatkontrolle.de

Hilf jetzt mit die Chatkontrolle zu stoppen

Monday, 09. September 2024

Damien Bod

Implement OpenID Connect Back-Channel Logout using ASP.NET Core, Keycloak and .NET Aspire

This post shows how to implement an OpenID Connect back-channel logout using Keycloak, ASP.NET Core and .NET Aspire. The Keycloak and the Redis cache are run as containers using .NET Aspire. Two ASP.NET Core UI applications are used to demonstrate the server logout. Code: https://github.com/damienbod/keycloak-backchannel Setup The applications are run and tested using .NET Aspire. … … Continue read

This post shows how to implement an OpenID Connect back-channel logout using Keycloak, ASP.NET Core and .NET Aspire. The Keycloak and the Redis cache are run as containers using .NET Aspire. Two ASP.NET Core UI applications are used to demonstrate the server logout.

Code: https://github.com/damienbod/keycloak-backchannel

Setup

The applications are run and tested using .NET Aspire. The UI applications are setup using the OpenID Connect code flow with PKCE and OAuth PAR. Two docker containers are used, one for the Redis cache and one for the Keycloak server.

.NET Aspire Setup

The AppHost project in .NET Aspire is used to configure the different services. The Keycloak container is run using HTTPS with develop certificates. The ASP.NET Core applications are run using development certificates as well. For the Keycloak server to access the docker host, it must register the same developer certificates from the host, or disable the certificate trust manger inside the container. This works good, but should only be setup like this in development. The different project must reference each other as required. To use Redis and Keycloak, the Aspire Nuget packages for these containers need to be installed.

var keycloak = builder.AddKeycloakContainer("keycloak", userName: userName, password: password, port: 8080) .WithArgs("--features=preview") // for more details regarding disable-trust-manager see https://www.keycloak.org/server/outgoinghttp#_client_configuration_command // IMPORTANT: use this command ONLY in local development environment! .WithArgs("--spi-connections-http-client-default-disable-trust-manager=true") .WithDataVolume() .RunWithHttpsDevCertificate(port: 8081); var cache = builder.AddRedis("cache", 6379) .WithDataVolume(); var mvcpar = builder.AddProject<Projects.MvcPar>("mvcpar") .WithExternalHttpEndpoints() .WithReference(keycloak) .WithReference(cache); var mvcbackchanneltwo = builder.AddProject<Projects.MvcBackChannelTwo>("mvcbackchanneltwo") .WithExternalHttpEndpoints() .WithReference(keycloak) .WithReference(cache);

Keycloak OpenID Connect client configuration

The Keycloak client should have the backchannel logout activated. The container uses the localhost applications from the docker host and so the host.docker.internal domain is used. The logout endpoint is implemented in the ASP.NET Core application.

ASP.NET Core Logout

Each ASP.NET Core application that supports the back-channel logout must have a server implementation and provide a web hook for the identity provider (Keycloak) logout event. If multiple instances are running, one can send a logout event to Keycloak. This ends the session on the identity provider and sends a logout post request to all server backends hosting the UI application. The logout event is handled and persisted to a distributed cache. For all other instances which request data from the server for the same user and session, the session is ended and the user must authentication again.

Logout request Webhook for logout event from Keycloak server for all instances Persist event to cache if missing and logout Logout UI on next HTTP request for user sessions

services.AddTransient<CookieEventHandler>(); services.AddSingleton<LogoutSessionManager>(); services.AddHttpClient(); services.Configure<AuthConfiguration>(configuration.GetSection("AuthConfiguration")); var authConfiguration = configuration.GetSection("AuthConfiguration"); builder.AddRedisDistributedCache("cache"); services.AddAuthentication(options => { options.DefaultScheme = CookieAuthenticationDefaults.AuthenticationScheme; options.DefaultChallengeScheme = OpenIdConnectDefaults.AuthenticationScheme; }) .AddCookie(options => { options.ExpireTimeSpan = TimeSpan.FromMinutes(60); options.Cookie.Name = "MvcPar"; options.EventsType = typeof(CookieEventHandler); }) .AddOpenIdConnect(OpenIdConnectDefaults.AuthenticationScheme, options => { options.Authority = authConfiguration["StsServerIdentityUrl"]; options.ClientSecret = authConfiguration["ClientSecret"]; options.ClientId = authConfiguration["Audience"]; options.ResponseType = OpenIdConnectResponseType.Code; options.Scope.Clear(); options.Scope.Add("openid"); options.Scope.Add("profile"); options.Scope.Add("email"); options.Scope.Add("offline_access"); options.ClaimActions.Remove("amr"); options.ClaimActions.MapJsonKey("website", "website"); options.GetClaimsFromUserInfoEndpoint = true; options.SaveTokens = true; options.PushedAuthorizationBehavior = PushedAuthorizationBehavior.Require; options.TokenValidationParameters = new TokenValidationParameters { NameClaimType = JwtClaimTypes.Name, RoleClaimType = JwtClaimTypes.Role, }; });

Note: The sample code in the repository was created using the IdentityServer4 Samples.

Redis cache

Redis Insight can be used to view the Redis cache data. Each time the application handles a new user and session logout event, it persists the event to the cache. If any further application instances are authenticated using this session and user, the application will sign-out as well on the next server event. The expiry time of the cache should be long enough so that an offline session cannot be opened after it expires.

Limitations

The back-channel logout only works on a per browser session because Keycloak creates new sessions for each browser. When the logout is received it is filtered and handled using the Keycloak session. If Keycloak can use a single session for all browsers of each user, then the logout can work for all active UI apps of the same user.

Links

https://www.keycloak.org/

https://openid.net/specs/openid-connect-backchannel-1_0.html

https://ldapwiki.com/wiki/OpenID%20Connect%20Back-Channel%20Logout

https://datatracker.ietf.org/meeting/97/materials/slides-97-secevent-oidc-logout-01

https://docs.microsoft.com/en-us/aspnet/core/fundamentals/app-state

https://docs.microsoft.com/en-us/azure/azure-cache-for-redis/cache-dotnet-core-quickstart

https://github.com/RedisInsight/RedisInsight

Sunday, 08. September 2024

Werdmüller on Medium

A conversation with myself about immigration

What is your ideal place? What are its values? Continue reading on Medium »

What is your ideal place? What are its values?

Continue reading on Medium »

Friday, 06. September 2024

Justin Richer

Making Bubbles: Re-connecting

If a set of accounts live in isolation forever, what happens to those accounts only matters within that isolated system. But when we make a bubble in our federated network, we aren’t just making a stand-alone system that can go live in a fully disconnected state. Over time, we expect things to re-connect, and when that happens, data needs to be harmonized across the boundaries. So many bubbles, so

If a set of accounts live in isolation forever, what happens to those accounts only matters within that isolated system. But when we make a bubble in our federated network, we aren’t just making a stand-alone system that can go live in a fully disconnected state. Over time, we expect things to re-connect, and when that happens, data needs to be harmonized across the boundaries.

So many bubbles, so many connections Data Synchronization

When multiple independent systems live together in the world and share data, inevitably that data is going to get out of sync. In a purely heirarchical system, we’re mostly concerned with building a consensus around the correct state of the shared data. We see this approach in distributed ledger systems, where nodes eventually come to a common view of what the shared state of the world ought to be.

But we don’t have that in a bubble-based architecture, because we don’t expect everyone to have the same view. Instead, we expect many different independent views to contribute to each other in a distributed fashion. This effectively means that for each bubble, it can send updates in and out to other systems. In most cases, there is a directionality to the data flow: one side is going to be treated as more authoritative than the other for a given context. What a bubble does when it’s on either end of that gap changes how we view the synchronization.

Account Updates From Above

When changes are pushed to us from an authoritative source, the simplest thing is to overwrite everything. After all, if it’s authoritative, why wouldn’t we just take that statement as truth? But the reality is that we’ve likely augmented our record of that user with additional details, overrides, and other localized changes that we don’t want to lose.

Local data (blue) shadows updates from the source (orange), in some cases

In these cases, we can shadow the data. In other words, we keep a copy of the source’s data separate from our own local view. When we get an update from that source, we can update our copy of the source’s data with wild abandon. We can then decide, by local policy, whether we want to adjust our overrides based on the update. Importantly, this decision is separate from accepting and processing the updated data from the source. This setup allows us to keep local information in the bubble at the same time that we sync from elsewhere.

Account Updates From Below

An authoritative bubble is sometimes going to want to pull updated information from the bubbles that live downstream. These are systems that we’ve sent accounts out to, and those systems might have something to tell us about our users. Maybe there’s an additional accreditation that’s been earned, or something to represent additional access details outside of our bubble, or even just an update to one of the core fields we sent down.

In any case, the downstream bubble is sending us additional data about a user, and we now have the chance to do something about it. If nothing else, we can store it and note it. If we want to, we can update the user’s record that we hold locally, and even go so far as to propagate that downward again to other bubbles.

Changing Trust

It’s not just user data that we can pass around, though that’s the most common piece we’d expect to see. The bubbles can also pass about other bubbles to each other, and incorporate what they learn into their own systems.

For many situations that fit the bubble architecture patterns, we expect to meet new peers in the field and make new decisions based on local context and requirements. These newly-discovered bubbles can then be propagated through the network, along with potentially interesting information about what the presenting bubble trusts it for.

Audit Logs

Now that we can identify users and the systems they come from, we can start to do one of the most interesting and perhaps complex jobs of a reconnected bubble: audit logs. When a bubble gets provisioned, that provisioning authority is likely to want to know what happens in that bubble during the disconnected spells. The bubble can package up the relevant audit log history and pass it along to the authorities that need it.

But auditing can go further than that: for any user that enters our bubble from somewhere else, we probably want to report back to the system that sent them to us. Since we know where we originally learned of them, and we know how to identify that system as a whole, we can filter and target the information we’re sending over. And we can do this while processing the updates they’re sending us about the account.

Conclusion

The bubble pattern sits between the always-connected and always-separate models, but the moment of reconnection gives us a profound opportunity to process user and trust data in ways that we may have never considered.

Thursday, 05. September 2024

Jon Udell

Revisiting the Elm City project

“Communities that want to build comprehensive public calendars will be able to do so using a hybrid approach that blends existing iCalendar feeds with feeds synthesized from web calendars. It’s not a perfect solution, but with LLM assistance it’s a workable one. And who knows, maybe if people see what’s possible when information silos converge, … Continue reading Revisiting the Elm City project

“Communities that want to build comprehensive public calendars will be able to do so using a hybrid approach that blends existing iCalendar feeds with feeds synthesized from web calendars. It’s not a perfect solution, but with LLM assistance it’s a workable one. And who knows, maybe if people see what’s possible when information silos converge, the common tools that can ease convergence will seem more attractive.” — An LLM-Turbocharged Community Calendar Reboot

Part of the LLM series at The New Stack.

Wednesday, 04. September 2024

Heres Tom with the Weather

At Bar del Pla in Barcelona

At Bar del Pla in Barcelona

Tuesday, 03. September 2024

Patrick Breyer

Chatkontrolle-Blitzentscheidung? Ungarn will beispiellose Pläne der EU zur Messenger-Massenüberwachung doch noch durchsetzen

Schon morgen früh sollen die EU-Regierungen den umstrittenen Gesetzentwurf zur Chatkontrolle, der im Juni nach massiven Protesten von der Tagesordnung genommen wurde, jetzt doch unterstützen. Erreichen will die neue ungarische …

Schon morgen früh sollen die EU-Regierungen den umstrittenen Gesetzentwurf zur Chatkontrolle, der im Juni nach massiven Protesten von der Tagesordnung genommen wurde, jetzt doch unterstützen. Erreichen will die neue ungarische Ratspräsidentschaft dies nach einer Meldung des Nachrichtendienstes Contexte mit einer von Politico geleakten Detailänderung, nämlich dem Verzicht auf die besonders fehleranfällige Suche nach unbekanntem Material im Zuge der verpflichtenden Chatkontrolle (freiwillig soll sie möglich bleiben). Geplant bleibt aber ansonsten die automatisierte Durchsuchung und gegebenenfalls Ausleitung privater Chats, darunter auch Ende-zu-Ende-verschlüsselte Chats, die zuvor klassifizierte verdächtige Fotos oder Videos enthalten sollen. Lehnt ein Nutzer diese „Upload-Moderation“ seiner Chats ab, könnte er keinerlei Bilder, Videos oder URLs mehr senden oder empfangen. Signal und Threema haben angekündigt, dass sie ihre Dienste in der EU einstellen würden, wenn sie gezwungen würden, die vorgeschlagene automatisierte Massenüberwachung (sogenanntes „Client-Side Scanning“) umzusetzen.

Der ehemalige Europaabgeordnete der Piratenpartei, Patrick Breyer, ruft die Bürger jetzt dazu auf sich an die EU-Regierungen zu wenden: „Im Juni gab es unter massivem öffentlichen Druck noch eine hauchdünne Sperrminorität zur Rettung des Digitalen Briefgeheimnisses, aber kaum bemerkte Blitzaktionen, zwischenzeitliche Regierungswechsel und Minimalzugeständnisse können das schnell ändern. Falls die Chatkontrolle kommt, werden wir gängige sichere Messenger ganz einfach nicht mehr nutzen können – das bedeutet wir verlieren den Kontakt zu unseren Freunden und Kollegen in der ganzen Welt“, warnt Breyer. „Wollen wir wirklich, dass Europa weltweit führend beim Abhören unserer Smartphones und der flächendeckenden Überwachung der Chats von Millionen gesetzestreuer Bürger wird? Das Europäische Parlament ist überzeugt, dass diese orwellsche Chatkontrolle das dringliche Anliegen eines besseren Kinder- und Opferschutzes verrät, weil sie unweigerlich vor Gericht scheitern wird. Es fordert deshalb einen wirklich wirksamen Kinderschutz durch sicherere Apps, proaktive Säuberung des Internets und eine Pflicht zur Löschung illegaler Inhalte – nichts davon ist in dem dem neuesten Orban-Vorstoß vorgesehen, zu dem sich die Regierungen morgen positionieren sollen. Jetzt liegt es an uns ihn zu stoppen!“

Die Chatkontrolle auf vermeintlich „bekannte“ illegale Inhalte beschränken zu wollen, bezeichnet Breyer als Augenwischerei: „Egal mit welchem Ziel – auch die Post darf nicht einfach jeden Brief verdachtslos öffnen und durchschnüffeln. Gerade die von US-Konzernen schon heute freiwillig praktizierte Durchleuchtung nach vermeintlich bekannten Inhalten führt zur Ausleitung tausender völlig legaler privater Chats, zur Überlastung der Strafverfolger und zur massenhaften Kriminalisierung Minderjähriger. Massenüberwachung ohne jeden Anlass zerstört das digitale Briefgeheimnis und sichere Verschlüsselung, auf die wir alle und auch unsere Kinder dringend angewiesen sind.“

Breyers Infoportal zur Chatkontrolle: chatkontrolle.de

Hilf jetzt mit die Chatkontrolle zu stoppen

Monday, 02. September 2024

Damien Bod

Implement ASP.NET Core OpenID Connect OAuth PAR client with Keycloak using .NET Aspire

This post shows how to implement an ASP.NET Core application which uses OpenID Connect and OAuth PAR for authentication. The client application uses Keycloak as the identity provider. The Keycloak application is hosted in a docker container. The applications are run locally using .NET Aspire. This makes it really easy to develop using containers. Code: … … Continue reading →

This post shows how to implement an ASP.NET Core application which uses OpenID Connect and OAuth PAR for authentication. The client application uses Keycloak as the identity provider. The Keycloak application is hosted in a docker container. The applications are run locally using .NET Aspire. This makes it really easy to develop using containers.

Code: https://github.com/damienbod/keycloak-backchannel

Setup

The standard Aspire Microsoft template was used to setup the .NET Aspire AppHost, ServiceDefaults projects. The Keycloak container service was added to the AppHost project using the Keycloak.AuthServices.Aspire.Hosting Nuget package. An ASP.NET Core Razor Page project was added as the UI client, but any project can be used like Blazor or an MVC application.

Keycloak Setup

The Keycloak Container is completely setup in the AppHost project. The Keycloak.AuthServices.Aspire.Hosting Nuget package is used to add the integration to .NET Aspire. For this to work, Docker Desktop needs to be installed in the development environment. I want to use the Keycloak preview features and initialized this using the WithArgs method. If using the Microsoft Keycloak package, the setup is almost identical.

var userName = builder.AddParameter("userName"); var password = builder.AddParameter("password", secret: true); var keycloak = builder.AddKeycloakContainer("keycloak", userName: userName, password: password, port: 8080) .WithArgs("--features=preview") .WithDataVolume() .RunWithHttpsDevCertificate(port: 8081);

I want to develop using HTTPS and so the Keycloak container needs to run in HTTPS as well. This was not so simple to setup, but Damien Edwards provided a solution which works great.

The RunWithHttpsDevCertificate extension method was added using his code and adapted so that the port is fixed for the HTTPS Keycloak server. This implementation requires the System.IO.Hashing Nuget package.

using System.Diagnostics; using System.IO.Hashing; using System.Text; namespace Aspire.Hosting; /// <summary> /// Original src code: /// https://github.com/dotnet/aspire-samples/blob/b741f5e78a86539bc9ab12cd7f4a5afea7aa54c4/samples/Keycloak/Keycloak.AppHost/HostingExtensions.cs /// </summary> public static class HostingExtensions { /// <summary> /// Injects the ASP.NET Core HTTPS developer certificate into the resource via the specified environment variables when /// <paramref name="builder"/>.<see cref="IResourceBuilder{T}.ApplicationBuilder">ApplicationBuilder</see>. /// <see cref="IDistributedApplicationBuilder.ExecutionContext">ExecutionContext</see>.<see cref="DistributedApplicationExecutionContext.IsRunMode">IsRunMode</see><c> == true</c>.<br/> /// If the resource is a <see cref="ContainerResource"/>, the certificate files will be bind mounted into the container. /// </summary> /// <remarks> /// This method <strong>does not</strong> configure an HTTPS endpoint on the resource. Use <see cref="ResourceBuilderExtensions.WithHttpsEndpoint{TResource}"/> to configure an HTTPS endpoint. /// </remarks> public static IResourceBuilder<TResource> RunWithHttpsDevCertificate<TResource>(this IResourceBuilder<TResource> builder, string certFileEnv, string certKeyFileEnv) where TResource : IResourceWithEnvironment { const string DEV_CERT_DIR = "/dev-certs"; if (builder.ApplicationBuilder.ExecutionContext.IsRunMode) { // Export the ASP.NET Core HTTPS development certificate & private key to PEM files, bind mount them into the container // and configure it to use them via the specified environment variables. var (certPath, _) = ExportDevCertificate(builder.ApplicationBuilder); var bindSource = Path.GetDirectoryName(certPath) ?? throw new UnreachableException(); if (builder.Resource is ContainerResource containerResource) { builder.ApplicationBuilder.CreateResourceBuilder(containerResource) .WithBindMount(bindSource, DEV_CERT_DIR, isReadOnly: true); } builder .WithEnvironment(certFileEnv, $"{DEV_CERT_DIR}/dev-cert.pem") .WithEnvironment(certKeyFileEnv, $"{DEV_CERT_DIR}/dev-cert.key"); } return builder; } /// <summary> /// Configures the Keycloak container to use the ASP.NET Core HTTPS development certificate created by <c>dotnet dev-certs</c> when /// <paramref name="builder"/><c>.ExecutionContext.IsRunMode == true</c>. /// </summary> /// <remarks> /// See <see href="https://learn.microsoft.com/dotnet/core/tools/dotnet-dev-certs">https://learn.microsoft.com/dotnet/core/tools/dotnet-dev-certs</see> /// for more information on the <c>dotnet dev-certs</c> tool.<br/> /// See <see href="https://learn.microsoft.com/aspnet/core/security/enforcing-ssl#trust-the-aspnet-core-https-development-certificate-on-windows-and-macos"> /// https://learn.microsoft.com/aspnet/core/security/enforcing-ssl</see> /// for more information on the ASP.NET Core HTTPS development certificate. /// </remarks> public static IResourceBuilder<KeycloakResource> RunWithHttpsDevCertificate(this IResourceBuilder<KeycloakResource> builder, int port = 8081, int targetPort = 8443) { if (builder.ApplicationBuilder.ExecutionContext.IsRunMode) { // Mount the ASP.NET Core HTTPS development certificate in the Keycloak container and configure Keycloak to it // via the KC_HTTPS_CERTIFICATE_FILE and KC_HTTPS_CERTIFICATE_KEY_FILE environment variables. builder .RunWithHttpsDevCertificate("KC_HTTPS_CERTIFICATE_FILE", "KC_HTTPS_CERTIFICATE_KEY_FILE") .WithHttpsEndpoint(port: port, targetPort: targetPort) .WithEnvironment("KC_HOSTNAME", "localhost") // Without disabling HTTP/2 you can hit HTTP 431 Header too large errors in Keycloak. // Related issues: // https://github.com/keycloak/keycloak/discussions/10236 // https://github.com/keycloak/keycloak/issues/13933 // https://github.com/quarkusio/quarkus/issues/33692 .WithEnvironment("QUARKUS_HTTP_HTTP2", "false"); } return builder; } private static (string, string) ExportDevCertificate(IDistributedApplicationBuilder builder) { // Exports the ASP.NET Core HTTPS development certificate & private key to PEM files using 'dotnet dev-certs https' to a temporary // directory and returns the path. // TODO: Check if we're running on a platform that already has the cert and key exported to a file (e.g. macOS) and just use those instead. var appNameHashBytes = XxHash64.Hash(Encoding.Unicode.GetBytes(builder.Environment.ApplicationName).AsSpan()); var appNameHash = BitConverter.ToString(appNameHashBytes).Replace("-", "").ToLowerInvariant(); var tempDir = Path.Combine(Path.GetTempPath(), $"aspire.{appNameHash}"); var certExportPath = Path.Combine(tempDir, "dev-cert.pem"); var certKeyExportPath = Path.Combine(tempDir, "dev-cert.key"); if (File.Exists(certExportPath) && File.Exists(certKeyExportPath)) { // Certificate already exported, return the path. return (certExportPath, certKeyExportPath); } else if (Directory.Exists(tempDir)) { Directory.Delete(tempDir, recursive: true); } Directory.CreateDirectory(tempDir); var exportProcess = Process.Start("dotnet", $"dev-certs https --export-path \"{certExportPath}\" --format Pem --no-password"); var exited = exportProcess.WaitForExit(TimeSpan.FromSeconds(5)); if (exited && File.Exists(certExportPath) && File.Exists(certKeyExportPath)) { return (certExportPath, certKeyExportPath); } else if (exportProcess.HasExited && exportProcess.ExitCode != 0) { throw new InvalidOperationException($"HTTPS dev certificate export failed with exit code {exportProcess.ExitCode}"); } else if (!exportProcess.HasExited) { exportProcess.Kill(true); throw new InvalidOperationException("HTTPS dev certificate export timed out"); } throw new InvalidOperationException("HTTPS dev certificate export failed for an unknown reason"); } }

Note: The AppHost project must reference all the services used in the solution.

Keycloak client configuration

See the razorpagepar.json file in the git repository. This is a Keycloak export of the whole client. This can be imported and updated.

The client is configured to use PAR.

ASP.NET Core OpenID Connect client using OAuth PAR

The client application uses the standard OpenID Connect client and requires OAuth PAR for authentication. This is a new feature in .NET 9. The repo has a Razor Page OpenID Connect example as well as an MVC client sample. This would be the same for a Blazor application.

services.AddAuthentication(options => { options.DefaultScheme = CookieAuthenticationDefaults.AuthenticationScheme; options.DefaultChallengeScheme = OpenIdConnectDefaults.AuthenticationScheme; }) .AddCookie() .AddOpenIdConnect(OpenIdConnectDefaults.AuthenticationScheme, options => { options.Authority = authConfiguration["StsServerIdentityUrl"]; options.ClientSecret = authConfiguration["ClientSecret"]; options.ClientId = authConfiguration["Audience"]; options.ResponseType = "code"; options.Scope.Clear(); options.Scope.Add("openid"); options.Scope.Add("profile"); options.Scope.Add("email"); options.Scope.Add("offline_access"); options.ClaimActions.Remove("amr"); options.ClaimActions.MapJsonKey("website", "website"); options.GetClaimsFromUserInfoEndpoint = true; options.SaveTokens = true; options.PushedAuthorizationBehavior = PushedAuthorizationBehavior.Require; options.TokenValidationParameters = new TokenValidationParameters { NameClaimType = JwtClaimTypes.Name, RoleClaimType = JwtClaimTypes.Role, }; });

Notes

.NET Aspire looks great and is easy to use in development. I am only learning this and must learn the details now. I have some issues using the containers and HTTPS and I don’t understand how the configuration works. I also don’t understand how this would work in production. Lots to learn.

Links

https://www.keycloak.org/

https://www.keycloak.org/server/features

https://github.com/NikiforovAll/keycloak-authorization-services-dotnet

https://openid.net/specs/openid-connect-backchannel-1_0.html

https://github.com/dotnet/aspire-samples/tree/main/samples

https://learn.microsoft.com/en-us/dotnet/aspire/get-started/aspire-overview

Sunday, 01. September 2024

Wrench in the Gears

Childhood Memories, An Unexpected Artifact, And Imaginal Openings

I write this post on the first anniversary of the death of my dear father Jerry Lee Hawver who grew up in an unstable, alcoholic household and through hard work and great people skills became a star athlete and executive with Proctor and Gamble. He and my mother raised my brother and me in the [...]

I write this post on the first anniversary of the death of my dear father Jerry Lee Hawver who grew up in an unstable, alcoholic household and through hard work and great people skills became a star athlete and executive with Proctor and Gamble. He and my mother raised my brother and me in the corporate suburbia of the New South of the 1970s and 80s, steadily climbing the ladder of the American dream. For my formative years that was my culture. The dynamic in our home was that I was the child who sought approval. My brother pushed the boundaries. Over time he and I have come together to meet in the center. We are both Hawver stock, and we “get ‘er done.”

In those early years of the feral-child 1970s, us kids had the run of the neighborhood yards and wood lots. I’ve spoken of hanging out by the creek lined with quartz-laden rip-rap looking for crawdads and swinging from giant vines and making “forts.” My best friend’s name was Andrea Hopkins. She and her brother Brandon lived in a house that was kitty-cross from our backyard. Her father had a British accent and they had a few antiques, which I thought made them rather a glamorous family. We would often spend the afternoon exploring, riding our Big Wheels around, playing freeze tag, and putting on plays in the garage. Sometimes Andi and I would get crafty and make little coil pots out of the red Kentucky clay and fill them with potions made of petals from her mom’s flower beds along her driveway. Of course, there were marigolds in the mix. In some ways my new “cottage” rancher is like a return to this liminal space of childhood imaginative possibility. 

I realize now that early success is not necessarily a blessing. I did well in school after a disastrous second-grade year with a body-building teacher, Mrs. Weigand, in an open classroom school I had to take three buses to get to (desegregation). I have a strange memory of being shown a graphic movie of the horrors of smallpox in a tiny room with carpeted levels. That made no sense to me, but I later found out that the Flexners (of the Carnegie / Rockefeller Flexner Report) were from Louisville, so maybe we were part of some public health behavioral experiment. 

I recovered the following year at the much more traditional Wilder Elementary School where we had a fabulous librarian who would take small groups of kids around the shelves and do short book pitches and place books into the hands of kids she knew would love them. It’s funny that there were quite a few paranormal kids’ books in the 70s – John Bellairs and Zilpha Keatley Snyder. The one book that really stayed with me was Margaret Anderson’s “To Nowhere And Back,” a time-travel adventure where a girl passes through an English hedge into another century to make friends with a poor servant girl on the other side. 

I was never in the popular crowd, but I always had a circle of friends. Without realizing it, my identity became wrapped up in my idea of performance, of being a “good” person based on what society said good was. My life unfolded with relative ease, meeting my husband in college, raising a small family, doing my part to make my neighborhood and city a better place, so I thought. But things started to change when my child was in middle school as I began to realize society was being broken on purpose, and the social systems I’d built my identity around began to give way one by one. Eisenhower’s chosen head for ARPA was the President of Procter and Gamble Neil McElroy? Oh, and they ran contracts building nuclear missiles in the panhandle of Texas (Pantex)? You don’t say? Well that sure put Mr. Whipple and Mrs. Olson in a new light in a new light for me. If one’s identity is wrapped up in societal expectations and those start falling away, how does the center hold?

What does it mean to be a “good person” within the expectations of a profoundly flawed system? Here my parents had worked so hard to walk us up the mountain of success, who was I to pull off on a wayside and kick rocks? I was the good girl, wasn’t I? When my marriage fell apart and my child became estranged from me, the silver lining was the opportunity to begin to shed the “good girl” archetype. Not that I had become a “bad girl,” it’s just that I recognized such labels were insufficient and did not do justice to the richly evolving tapestry of my life. Our lives have dramatic arcs. I know it sounds trite, but if there are no bumps in the road, you can’t truly appreciate the parts when the ride is smooth and clear. 

Last year we were trying to sort out the memorial service for my father. At the time I had been applying to all sorts of jobs with the idea that I would stay in Philadelphia and try to hold onto our family home that was almost paid off. In my perceived role of the “good mother,” I thought at some point my young adult child might return from the west coast and need a place to land. Life laughed at me and had the literal soles fall off my shoes on the walk to one interview at Wharton.

During the two-day drive from Philadelphia to North Carolina to be with my father in hospice I cried a lot. I thought about his strength, both physical and strength of character. He was a big man, and I was small, but we fit together, a chip off the old block. I was listening to a podcast this week on our relationship to death and ancestors and Perdita Finn said that sometimes when people go, it is because they can help the ones they love more from the other side. I truly believe my father has been with me every step of the way since that morning when I held his swollen big hands and I read to him from L’Engle’s “An Acceptable Time” and bathed his forehead in kisses, my cheeks wet, and he crossed over. 

After a hurtful email exchange with my child and husband about the memorial service, it was clear that I was being removed from their lives with strong intention and there was nothing I could do about it. I am a planner. Hawvers like to be in control. We try to fix things. All of those things were off the table. It was up to me to build a new story outside of all the stories that had defined my life since I met my husband during our study abroad program in Venice in 1989. I’d never been alone, and now I was going to figure out how to be me, for me, not the good daughter, the model student, the diligent mother and wife. I told my husband I was going to leave Philadelphia and get a cozy cottage where I could garden and a kayak. I spoke it outloud, a reasonable wish, with conviction and with a few days to spare I got it all squared away. 

It’s taken me a year to come around to this way of thinking, because there was a lot of pain to acknowledge and process. But now I realize I was being given space to unfold in a new way. I had a choice. I could hold onto the baggage I was carrying and imagine myself as a failure across many categories. The thing is, I actually like myself and I don’t feel shameful about the course I’ve taken. Regrets? Yes, of course I have some. Have I made missteps? Certainly. But not out of malice, out of lack of information, insight, and wisdom. I deeply miss the people I have lost, but I can learn to be, to become, outside of them, outside their stories and expectations, and that’s ok, too. Perhaps some day they will have grown in their own stories enough and there will be enough space to hold all of who we have become in some new kind of family. Or maybe that won’t happen in this version of the story. I have to learn to accept what comes with an open heart.

Before he left this Earth, my father navigated Alzheimer’s for about six years with numerous falls, escapades, ditched cars, eating light bulbs, jumping back yard fences, and taking apart cable boxes at his nursing home. As I wade into a broader understanding of consciousness, I open up space in my heart for what was really happening with my dad and his glitching interface with what we understand as reality. 

Over the course of my research this past year I found a very interesting memoir written by Elmer Green of the former Menninger Clinic in Topeka, KS where my dad grew up. It’s called “The Ozawkie Book of the Dead: Alzhiemers Isn’t What We Think It Is” (PDF) and it is the story of how he cared for his wife Alyce, his research partner in biofeedback, during her journey through that illness. It involves the nature of the soul and interpenetrating realms and the process of learning and unlearning how to be in this space. 

I think I’m going to read it again, because as terribly traumatic as it was to see my strong, capable father engage with the world in ways that were so extreme and problematic and inscrutable, having a year’s perspective and new awareness of proto-consciousness and the unified field and emergent complexity, I can sit with the hard parts of the process and realize that maybe what my father’s struggles were doing was giving me a window to look through for the answers I seek. 

I was under the weather for about a week. It started with a vicious migraine and then some shooting nerve pain and gradually resolved into a typical cold. Let me tell you, it’s not fun to be feverish in Arkansas in August. Most of the month the daytime highs have lingered in the upper 90s. By the end of the week I was regrouping, less foggy headed, and I decided to take a hike to Balanced Rock and make a heart to honor Jerry Hawver’s life. The trail is on the other side of town. It’s only a three-mile walk, and the idea of balance seemed appropriate. A friend had sent a delightful care package with a hummingbird card and dried herbs from her garden in Arizona. I clipped the hummingbird and took out a selection of peppers to place in my wicker basket along with a heart trimmed from the collar of one of his shirts. It still smells of his aftershave.  

I drove past the bath houses downtown then parked my Subaru on the gravel pull out along Cedar Glades Road, quickly crossed the blind curve, and started the climb. The online reviews said that the first half mile took some effort. It was steep, sure, but the challenge was walking on loose scree. For years I never gave much thought to being particularly careful while I was out and about, because there were always at least two of us. Now, it’s just me. So I have to be watchful about taking a misstep and turning an ankle. I took things slowly and on my way up I picked up a chunk of Novaculite that seemed to have been worked into a hand sized cutting tool or scraper. Idly I ran my fingers over it as I pressed on through the dry, dusty woods. 

Where it leveled out, I paused for a moment and a wild red and black bug caught my eye. When I looked it up later, it turned out to be a velvet ant, a parasitic wasp species where the females are wingless. A minute later, I saw her partner flying down the trail. Fortunately I had the good sense to keep my distance, as I later found out their sting is many times worse than a hornet’s and they are commonly called “cow killers.” 

An interesting synchronicity is that in one of my early blog posts I warned about hybrid “personalized” learning being like a cicada killer. A cicada killer is a large wasp that frequented the garden where I worked. Its reproductive cycle involved capturing their chosen prey on the wing, paralyzing it, dragging it into an underground burrow, and then laying eggs on the body where the larvae would eat their way out. It was rather grotesque imagery, but it served to make a point. Well, turnabout is fair play, and evidently these velvet ants parasitize the pupae of cicada killers. Hmm.

Despite the dryness you could still find a few ferns, and where the side trail to Balanced Rock branched off there were some wild plums dropping onto the path. I gathered a few things en route to supplement what I’d been gifted and brought from home. Not long after I arrived at the outcropping – more loose scree, more careful steps. I managed to get far enough down the slope to fully appreciate the balance embodied by these two impressive boulders. 

Of course right as I got there, a couple arrived panting, and so we gave each other space. I pulled over to a little overlook in the shade and pulled out my basket of offerings. The arrangement of items in this particular heart is both loose and disordered while being vibrant and tactile. There are cracked nuts full of labyrinths, fractal turkey tail fungus, the first fall leaf offerings, and peppery spice. 

I think that would suit my dad just fine. By the end everything was coming apart, which is a difficult thing for Hawvers. Have I said before that Hawvers like to be in control and have a plan? Well, we do. And the last years of Jerry’s life were anything but that. At the time there was a heaviness about it, this uncontrollable fraying and sense of loss that would not be diverted, but a year later, perhaps I can reimagine it as a cocoon tearing apart the restricting expectations my dad must have felt to be a provider and a fixer and the rock for all of those years. To transition to a new form meant things were going to fall apart. How else is renewal possible?  In the looseness of MIA control systems there is the unrealized potential of a new blueprint and within the chaos, a kernel of what’s next.

I sat on that bit of bluff, and thanked my dad for all the things he provided for us. I can feel him giving me a big bear hug and me offering in return a big back scratch. I told him that I missed him deeply and thought of him every day. I also said that I knew he had been walking beside me, but that I was going to be ok, that if he felt held back in any way by me, he could go. I can just picture him high-fiving people all over heaven. I prayed that the creator would help me have the clarity to see the signs and find my purpose as I unfold into this new life. Then I packed up and headed back down the trail. 

The walk back was uneventful with one exception. Near the far end I paused before an unusual artifact, a rusted out wash basin hanging from a knot in a tree at eye level, almost where a shaving mirror would be. It had been galvanized, but the rust had gotten to it and the entire bottom was eroded away. I lifted it off the knot to examine it more closely. It wasn’t large, a bit wider than a typical dinner plate and about four inches deep with a narrow rim punctured in one spot where you could hang it from a nail. 

Now it wasn’t a totally bizarre addition to the landscape. You might find something like that tossed in a ravine out behind an old farmhouse. But there was no other debris around, no trash. I found it right along a trail on land that had been a national park for almost a century. It wasn’t tucked away in the brush; it was right in front of my face. It wasn’t a plastic bottle or even a face mask carelessly tossed trail side. No, this was a message for me, this wash basin, that if it weren’t for the rusted out bottom, would be right at home as part of a Cracker Barrel wall display.

Ok, so I had asked for the wisdom to notice the signs, but then what was I to make of this out-of-place, out-of-time basin? One story you might tell is one of obsolescence. When my life first fell apart during the lockdowns, that was the story I initially jumped into. 

Who am I now that I am no longer my child’s mother, my husband’s wife, my mother’s accomplished daughter? I was a basin whose design was to hold things for others. Basins don’t function alone, they contain things. For decades my identity had contained many things, big and small. From carpool schedules and family vacation plans, to dinner menus and school registration procedures, civic campaigns, etc. etc. My existence was to be a helpful presence. Then somehow I picked up some rust. Perhaps at that school report card meeting in the summer of 2013. Rust wore away at the old functionality of the basin. Interestingly rust interferes with magnetism in metal. So as the basin’s rust surface area increased, it became less polarized. I imagine that might represent a shift towards non-dualism. 

Was I meant to believe I was a failed basin? No, of course not. Because I believe in a benevolent creator that is loving and wants what is best for us. So the story I am choosing to tell myself about the basin is one that Alison Hawver might have imagined in third grade in Mrs. Nevius’s cardboard geodesic dome reading nook sitting on a carpet square. I am going to imagine that like the ugly duckling becoming a graceful swan, I have been a capable and trustworthy basin for many years, but slowly grew into a magical portal of imaginative potential, which given the state of the world right now, is a pretty awesome skill to have on offer. The twenty-first century needs some wiggle room for new possibilities to emerge, some fluidity to flex the boundaries of spacetime.

I’ve traded in the closed, hygienic system of wash basin culture for the jagged, unexpected potential of a back-to-nature, quantum vacuum, anti-mirror portal into proto-consciousness where we can dip in for messages from worlds with a better view than ours. A portal like the cottage hedge in the book “To Nowhere and Back” or the star watching rock from L’Engle’s Time Quintet that I read from at my father’s death bed.

Well, I’ve written all day, and the heat has finally broken with a gentle rain and a cool breeze here on the crystalline flank of Spa City. I hope you haven’t found this too self-indulgent or maudlin. There are lots of big feelings to navigate these days for all of us, and sometimes it’s useful to remember back to the times when all we had to do was make petal potions with our backyard buddies. If you quantum foam some interesting artifacts into your life, I’d love to hear about them and the story goes with that. Thanks for hanging out with me on this Labor Day weekend 2024. A toast to Jerry Hawver. I wouldn’t be who I am without him. 


A Soft Landing on Marigold Circle

Dear readers, For those who are still following along, you’ve probably noticed it’s been quiet on the blog this summer. I did a short update on my channel a few weeks ago letting folks know I found a cozy home, a 1970s two-bedroom brick rancher on about a quarter acre on the outskirts of Hot [...]

Dear readers,

For those who are still following along, you’ve probably noticed it’s been quiet on the blog this summer. I did a short update on my channel a few weeks ago letting folks know I found a cozy home, a 1970s two-bedroom brick rancher on about a quarter acre on the outskirts of Hot Springs, AR. It is a sturdy little house with a sunny yard and quiet neighbors, mostly retirees who are kind about looking out for you without intruding and generously extend weekly invitations (in season) to share enormous slabs of delicious ice cold water melon. It’s nothing fancy, but it after a month I can say it suits me perfectly and I am grateful for the blessings of having a safe place to embark on this next phase of life.

My snail mail address is 123 Marigold Circle Hot Springs, AR 71913. I’ve already received a few letters and care packages, which are much appreciated. I’m loving the slower pace of things in my chosen exile. Maybe I can have a go setting up some old school pen pal correspondences if you care to drop me a line. 

Marigold Circle is a loop at the end of Hobson Road that is tucked into the base of one of the ridges underlaid by that famous quartz crystal that comprise Hot Springs National Park, the oldest federally protected land in the country (1830s compliments of Andrew Jackson). This development of about thirty homes, locally known as “Little Chicago” is situated where West Mountain begins to transition into Music Mountain. For years Hot Springs was known as a mobsters’ hang out, some with Chicago ties. Oh, and the circle is a square (not a cul-de-sac), and I’m in the middle of the doughnut (aka the quantum vacuum, wink, wink).

I’m loving the breathing room provided by the small open lawn around my house that gives me space to take in the blue Arkansas skies, often punctuated by dramatic towering clouds. There is considerably less geoengineering here than in Philadelphia. It does happen, but much less. The forest that embraces our little development is a mix of pine and hardwoods, so I should have nice vistas at all times of the year. It’s not an off-the-grid, bug-out place; but it is quiet, there are good people here, and lots of places to hike and get out on the water. I bought an inflatable stand up paddle board that can convert into a basic kayak, which will be easier for me to transport. Now that I’ve got my life vest, I’m going to head out AFTER the Labor Day boating hoopla and soak up a bit of the calm of the shoreline of Lake Ouachita and the 1.8 million acres of national forest surrounding it. Water is the theme of the year.

My yard doesn’t have trees, but my neighbors do – a sweetgum and sycamore on the south, a maple to the north. It’s been a very hot month with daytime temps rarely getting out of the upper nineties and almost no rain. The poor gum tree, in particular, seems quite stressed. Having lived in a row house for the past thirty years, I didn’t own a rake, just a broom for the front walk, so I grabbed a solid wooden-handled Craftsman at Lowe’s this week. Even though I got a blower with my lawn mower purchase, I do hate the sound of them. For me Hot Springs is a great chance to slow down and get back to the basics. I slowly filled half a contractor bag with leaves yesterday and look forward to getting a better composting situation set up soon.

There’s a towering “axis mundi” oak across the street. It has been trimmed back by the power company, so it appears almost split in half, its  impressive fractal branching pattern is on full display, which is right in line with my recent musings on complexity and fractals. This particular specimen is over sixty-feet high. Given the severe weather you can get here, I recently took out a measuring tape to see how much distance there was between its base and my eaves. Hopefully, if it ever falls, only the top ten feet will hit my house, enough to mess things up, but not totally destroy the house. 

The one poor choice I’ve made so far (that I know of) is that I initially bought a terribly cheap, uncomfortable twin mattress. I don’t have sleep issues or chronic pain and don’t consider myself a mattress snob. Our old queen mattress that we had for over twenty years  held up like a champ. It was just a basic Serta or something, but of course they don’t make them like they used to. I ended up buying a replacement with natural materials that should be arriving next week. I’ll put the old one in a mattress bag in the basement for tornado season. It’s the perfect use for a sad excuse of a sleeping surface, perhaps one day padded protection from falling timbers. If that badly-pruned oak ends up taking a nosedive through my living room, it could end up being the hero of my story.

My house is on a slightly fall-away lot, which is why I have a basement. Happily, it is still level enough to garden once I get a  fence and raised beds set up to deter the many deer and bunnies. All the houses are modest one-stories, and so my little back sun porch has the feeling of a tree house. The neighbors feed the birds (and the deer). Most days I feel I’m simply a visitor in their avian kingdom. Marigold Circle is an ideal woodland edge habitat. There is a raucous group of jays, daredevil mocking birds and thrashers, lots of flickers, turtle doves, cardinals, house finches, and Carolina wrens. Some evenings or early mornings I get lucky to catch a glimpse of a hummingbird probing the magenta trumpets of the Four O’Clocks in my front dooryard. There are lots of dragonflies and butterflies and wasps, too. The wasps pollinate the big turkey fig out back that has been offering me a handful of sweet fruits every morning since I moved in. 

I mentioned in my video that I’m weighing the extent to which it makes sense for me to continue to share my research moving forward. I’ve pretty much unplugged. I’m not on social media outside of the interactions I have on my channel and blog. I don’t have a television or radio. I have a portable “suitcase” record player and am gradually collecting bluegrass and folk on vinyl. I don’t check the news online. What information filters in comes to me through my small circle of friends. I’m less interested in blogging as a contact / “gotcha” sport, as tempting as that is with all the things we’ve been talking about now rolling out including Crypto Bobby, Trump’s new sidekick. 

You might think that is irresponsible of me to check out, but I have come to the realization that I am not going to change the world through social activism or civic engagement. Trying to convince people to pay attention to the story I have attempted to surface doesn’t work. The learning curve is too steep and the feed does not operate in my favor. That said, maybe there is another way to tweak the seemingly chaotic but perhaps not entirely complex labyrinth of what we understand to be reality. Maybe it is my task for now to try and put my intuition and intellect into puzzling out some cheat codes and rather than throwing a wrench in the gears to stop the progress of so many problematic programs, learn to play living systems instrumentally with heart by adjusting initial conditions and tuning constraints. It may be a fool’s errand, but for now I turn inward – not giving up, but handing over control to the field of imaginative possibility as I try and train my eyes to see the clues and piece them together. Being quiet and reading and sewing for the time being is enough. I have put in my time trying to be loud. In the end it was unsatisfying, and even destructive to my wellbeing.

The idea of responding to breaking developments isn’t satisfying. Neither is critiquing personalities or asserting the superiority of my lens. Mostly, I want to continue to follow the intellectual meanderings that feed my soul. Neal Stephenson’s novel, “Anathem” revolves around a society of monastic thinkers who remove themselves from the world, so they can focus on weightier contemplations. I can see the logic in that. Stepping outside the noise has brought me a measure of peace and calm. Walking around barefoot, something I couldn’t do in Philadelphia, has felt great and is very grounding, too! 

The physics of digital information is different. I think it comes so fast and furious and disconnected from our actual lived experience that it overwhelms our sense-making instincts both individually and collectively and ends up leaving us anxious and adrift. I feel like I know enough for the moment about the basics of web3, the mechanics of social impact policy and digital governance, and the Internet of Bio-Nanothings. Now I want to better understand the realm that we are in. Not in an esoteric, occult way that appeals to many; more like a Claude Shannon, information theory way. I think energy flow and information is the key to traversing the choice architecture of inner space. The more I discover, the less certain I am about anything. I do think there is merit in putting my thoughts down in writing with a date and time attached, even though I’m less and less invested in the concept of linear time. That said, the postings here will probably continue to be sparse. 

The topics I currently find to be most compelling are quantum biology, proto-consciousness, complex non-linear dynamical systems, morphic fields, parallel processing, distributed collective intelligence, the observer effect, narrative structures, spatial computing, responsive environments, digital artifacts and deep learning. I am dipping into a lot of information flows along with friends and I’m not exactly sure what to say about my thought process yet. In addition to Karl Friston’s Free Energy Principle and Chris Fields’ Active Inference work, I am also very interested in the ideas of the late Jungian psychiatrist, dream analyst, and quantum consciousness theorist Gordon Globus. I am wrapping up his 2009 work “The Postmodern Mind” now.

I have a lot of ideas about non-traditional “computing” interfaces and humans-in-the-loop that have less to do with nanotechnology than with Qi, psi and plasma and etheric realms. As the head of standards for IEEE once said – it’s really all metaphysics! 

Minimizing surprise and making solid predictions is an evolutionary imperative, according to Karl Friston’s free energy principle. We hold a model of how the world works in our heads, so that we can use it to make good choices navigating uncertainty. Many of those choices are fluid and socially normed. Some people are more comfortable changing their model than others. Some people deal with surprise by looking away. Then there are the odd balls like me who keep hoovering up information and trying to find patterns in it so that we can conjecture about stories that could make the whole messy pile hold together for a time at least, because there are always more surprises awaiting us.

It seems in my case, I became too much of a surprise for my beloved family, and the perceived cost of changing their models to accommodate my admittedly fringe world view was too high. I can’t say they were wrong. I will simply say I didn’t realize with each small step, new nugget of information, and adjustment that I was being removed from their world model. I simply did not appreciate how profound the cost would ultimately be. 

I am still working to make sense of what happened to me in a way that is non-judgemental and centers compassion for all involved. I do believe understanding the power of the heart field is vital to navigating the turbulence of the world – not just in a sweet, saccharine sort of way but in a quantum processing sort of way that connects us straight to the divine nature of the cosmos. 

I have another blog post I plan to work on tomorrow in honor of the year anniversary of my father’s passing. I’ve been under the weather for about a week, but it seemed to be resolving so a few days ago I took a short out and back hike to Balanced Rock on the other side of downtown. On the return leg I encountered an unusual artifact, which felt like it had some symbolic significance. My next writing will be a reflection on that.

I’ll close with an entertaining and informative talk my friend Sean posted this week about artifacts inspired by a discussion I’d posted with Lynn on Open Education Resources. Sean’s talk features a read aloud from a Mrs. Frizzle and the Magic School Bus book sponsored by the Department of Energy. It’s a good format, be sure to check it out.

 

All the best,

Alison McDowell

123 Marigold Circle

Hot Springs, AR 71913

PS: I just learned that marigolds go by the Latin name Tagetes erecta, which comes from Tages the Etruscan deity of divination. West is the direction of the underworld in Greek mythology, though I continue to imagine these other realms as being more about signals and information, which means it’s interesting that the ridge above me is West Mountain and Music Mountain. 


Aaron Parecki

Thoughts on the Resiliency of Web Projects

I just did a massive spring cleaning of one of my servers, trying to clean up what has become quite the mess of clutter. For every website on the server, I either:

I just did a massive spring cleaning of one of my servers, trying to clean up what has become quite the mess of clutter. For every website on the server, I either:

Documented what it is, who is using it, and what version of language and framework it uses Archived it as static HTML flat files Moved the source code from GitHub to a private git server Deleted the files

It feels good to get rid of old code, and to turn previously dynamic sites (with all of the risk they come with) into plain HTML.

This is also making me seriously reconsider the value of spinning up any new projects. Several of these are now 10 years old, still churning along fine, but difficult to do any maintenance on because of versions and dependencies. For example:

indieauth.com - this has been on the chopping block for years, but I haven't managed to build a replacement yet, and is still used by a lot of people webmention.io - this is a pretty popular service, and I don't want to shut it down, but there's a lot of problems with how it's currently built and no easy way to make changes switchboard.p3k.io - this is a public WebSub (PubSubHubbub) hub, like Superfeedr, and has weirdly gained a lot of popularity in the podcast feed space in the last few years

One that I'm particularly happy with, despite it being an ugly pile of PHP, is oauth.net. I inherited this site in 2012, and it hasn't needed any framework upgrades since it's just using PHP templates. My ham radio website w7apk.com is similarly a small amount of templated PHP, and it is low stress to maintain, and actually fun to quickly jot some notes down when I want. I like not having to go through the whole ceremony of setting up a dev environment, installing dependencies, upgrading things to the latest version, checking for backwards incompatible changes, git commit, deploy, etc. I can just sftp some changes up to the server and they're live.

Some questions for myself for the future, before starting a new project:

Could this actually just be a tag page on my website, like #100DaysOfMusic or #BikeTheEclipse? If it really needs to be a new project, then: Can I create it in PHP without using any frameworks or libraries? Plain PHP ages far better than pulling in any dependencies which inevitably stop working with a version 2-3 EOL cycles back, so every library brought in means signing up for annual maintenance of the whole project. Frameworks can save time in the short term, but have a huge cost in the long term. Is it possible to avoid using a database? Databases aren't inherently bad, but using one does make the project slightly more fragile, since it requires plans for migrations and backups, and  If a database is required, is it possible to create it in a way that does not result in ever-growing storage needs? Is this going to store data or be a service that other people are going to use? If so, plan on a registration form so that I have a way to contact people eventually when I need to change it or shut it down. If I've got this far with the questions, am I really ready to commit to supporting this code base for the next 10 years?

One project I've been committed to maintaining and doing regular (ok fine, "semi-regular") updates for is Meetable, the open source events website that I run on a few domains:

events.indieweb.org events.oauth.net events.scim.cloud

I started this project in October 2019, excited for all the IndieWebCamps we were going to run in 2020. Somehow that is already 5 years ago now. Well that didn't exactly pan out, but I did quickly pivot it to add a bunch of features that are helpful for virtual events, so it worked out ok in the end. We've continued to use it for posting IndieWeb events, and I also run an instance for two IETF working groups. I'd love to see more instances pop up, I've only encountered one or two other ones in the wild. I even spent a significant amount of time on the onboarding flow so that it's relatively easy to install and configure. I even added passkeys for the admin login so you don't need any external dependencies on auth providers. It's a cool project if I may say so myself.

Anyway, this is not a particularly well thought out blog post, I just wanted to get my thoughts down after spending all day combing through the filesystem of my web server and uncovering a lot of ancient history.

Thursday, 29. August 2024

Aaron Parecki

OAuth Oh Yeah!

The first law of OAuth states that

The first law of OAuth states that the total number of authorized access tokens in an isolated system must remain constant over time. Over time.
In the world of OAuth, where the sun always shines, Tokens like treasures, in digital lines. Security's a breeze, with every law so fine, OAuth, oh yeah, tonight we dance online!
The second law of OAuth states that the overall security of the system must always remain constant over time. Over time. Over time. Over time.
In the world of OAuth, where the sun always shines, Tokens like treasures, in digital lines. Security's a breeze, with every law so fine, OAuth, oh yeah, tonight we dance online!
The third law of OAuth states that as the security of the system approaches absolute, the ability to grant authorized access approaches zero. Zero!
In the world of OAuth, where the sun always shines, Tokens like treasures, in digital lines. Security's a breeze, with every law so fine, OAuth, oh yeah, tonight we dance online!
Tonight we dance online! OAuth, oh yeah! Lyrics and music by AI, prompted and edited by Aaron Parecki

Wednesday, 28. August 2024

Phil Windleys Technometria

Digital Identity and Access Control

In response to a post on X about China's social credit system, Paul Conlon said:

In response to a post on X about China's social credit system, Paul Conlon said:

Digital ID is ultimately about access control where those who impose the system are the ones determining what you are required to be and do.

Provision of resources and liberties become conditional upon the whims of the affluent. Doesn't sound safe or convenient to me.

From X
Referenced 2024-08-28T08:10:31-0400

How Paul said this struck me because I've been thinking a lot about access control lately. I believe that we build identity systems to manage relationships, but, as Paul points out, the ultimately utility of identity systems in many cases is access control.

This isn't, by itself, a bad thing. I'm glad that Google controls access to my GMail account so that only I can use it. But it doesn't stop there. If I use my Google account to log into other things, then Google ultimately controls my access to everything I've used it for. This is federation's original sin1.

Paul's comment points out the primary problem with how we build identity systems today: when access control is centralized, it inherently shifts power towards those who manage the system. This dynamic can lead to a situation where individuals must conform to the expectations or demands of those in control, just to maintain their access to essential services or resources. While we often accept this trade-off for convenience—like using Google to manage multiple logins—the broader implications are troubling.

The more we rely on federated identity systems, with their tendency to centralization, the more we risk ceding control over our digital lives, reducing our autonomy and increasing our dependence on entities whose goals may not align with our own. This is why the principles of self-sovereign identity (SSI) are so compelling. SSI proposes a model where individuals maintain control over their own identity, reducing the risks associated with centralized access control and enhancing personal freedom in the digital realm.

Critics of SSI will claim that giving people control over their identity means we have to accept their self assertions. Nothing could be further from the truth. When someone wants me to prove I'm over 18, I use a driver's license. The state is asserting my age, not me. But I'm in control of who I show that to and where. Sovereignty is about borders and imposes a system of relationships.

Now, China could use SSI to build the social credit system. One credential, controlled by the state, that is used to access everything. SSI makes individual control structurally possible, but can’t guarantee it. Technology alone can't solve this problem. As a society, we have to want a digital world, modeled on the physical one, where individuals are the locus of control and use information and assertions from a variety of credentials to build and and interact in peer-to-peer relationships. Until we value freedom and independence in the digital world, we will yield up control of our digital lives to others who will act in their own interests, not ours.

Notes

For similar reasons, I think federated social media systems are a bad idea too, but that's another blog post.

Photo Credit: Papers Please from DALL-E (public domain). Prompt: Draw a rectangular picture of police checking identity papers of people on the street


Bill Wendels Real Estate Cafe

Do HIDDEN listings trick homebuyers & sellers into double paydays, deny consumer savings?

Remember how much fun it was making up plays on the sandlot?  Ever try fooling the opposing team with a trick play — a hidden… The post Do HIDDEN listings trick homebuyers & sellers into double paydays, deny consumer savings? first appeared on Real Estate Cafe.

Remember how much fun it was making up plays on the sandlot?  Ever try fooling the opposing team with a trick play — a hidden…

The post Do HIDDEN listings trick homebuyers & sellers into double paydays, deny consumer savings? first appeared on Real Estate Cafe.

Tuesday, 27. August 2024

FACILELOGIN

How we built multi-regional routing at the edge @ DevRev

At DevRev, we are building an API-first dev-centric platform that gives you one unified view of your customers and products by connecting your data, teams, and users to our AgentOS platform and app experiences. This blog post shares some insights on how we built routing at the edge in our multi-regional deployment. The DevRev platform was initially deployed only in the US-East region. All data re

At DevRev, we are building an API-first dev-centric platform that gives you one unified view of your customers and products by connecting your data, teams, and users to our AgentOS platform and app experiences. This blog post shares some insights on how we built routing at the edge in our multi-regional deployment.

The DevRev platform was initially deployed only in the US-East region. All data related to organizations (or tenants) and their customers (workspaces) was stored in the US-East region. This raised two concerns, primarily regarding data residency and latency, which forced us to launch our 1st region outside the US, in Mumbai (ap-south-1), in March 2024, and now we are in the process of expanding our presence into multiple other regions.

The DevRev platform is built with 250+ microservices running on Kubernetes. Building the multi-regional architecture was a significant collaborative effort between the infrastructure and engineering teams. We went through numerous cycles of design, experimentation, and iteration to refine our approach. While future blog posts will explore the various design decisions in greater detail, this post focuses specifically on how we implemented routing at the edge.

Building multi-regional routing involves designing a system that efficiently directs traffic across geographically distributed regions to optimize performance, reliability, and adherence to data residency requirements. For instance, if performance is the key priority, traffic should be routed to the region with the lowest network latency relative to the user’s location. However, this requires replicating business data across multiple regions, which, while beneficial for reducing latency, is not the primary focus of our multi-regional strategy. Instead, we leverage Fastly as our CDN to cache relevant business data at globally distributed Fastly PoPs, ensuring lower latency by bringing the data closer to the user.

The primary goal of DevRev’s multi-regional deployment is to ensure data residency. Regardless of the user’s physical location, their requests are always routed to their organization’s designated home region. The home region is the specific region that hosts the organization’s data, and this region is chosen by the user when the organization is first created. Each organization is permanently tied to its selected region, maintaining compliance with data residency requirements.

Before we delve deeper into the routing design, let’s take a quick look at the DevRev high-level architecture. As shown in the following diagram, all the requests initiated from DevRev API clients are intercepted at the edge, by the corresponding Fastly PoP. All the API requests are first validated by a Compute service deployed in Fastly, at the edge. This will reject any unauthenticated requests at the edge itself, so the backend DevRev services only need to worry about the authenticated traffic.

The DevRev platform is built with 250+ microservices running on Kubernetes. Fastly Compute@Edge serves as the entry point at the edge for all DevRev services. At the origin, the DevRev API gateway intercepts all incoming traffic. Once the gateway verifies the JWT accompanying an API request, it dispatches the request to the appropriate service.

The second level of enforcement is at the DevRev API Gateway. All the traffic to the DevRev platform flows through the API Gateway, and it centrally enforces authentication with JWT verification. The third level of enforcement happens at the service itself. We have implemented the Service Mesh pattern with Istio; and each Envoy proxy, which sits with the service itself, in the same Kubernetes pod, verifies the JWT that comes with inbound traffic.

Multi-regional routing logic is built into the Compute service running on Fastly. This is the same service that does request validation (along with JWT verification) at the edge.

APIs can be invoked either by authenticated users or through anonymous requests. For instance, when you access the DevRev web app, certain static resources are loaded before the user is authenticated. Similarly, some generic feature flags, which are not user-specific and remain unchanged across sessions, are also fetched without requiring authentication. Another example is the API endpoint https://api.devrev.ai/dev-orgs.public-info.get?slug=your_slug. This endpoint allows the web app to retrieve organization metadata, which is essential for initiating the login flow.

Apart from the authenticated and anonymous requests, there is also another type of request that does not carry a JWT, but some sort of a hint to indicate the region it needs to be routed to. For example, the web socket upgrade call carries an ID in the query string that indicates the region it belongs to. In the same way, the dev-orgs.create API carries a region hint in an HTTP header. Apart from the query string and the HTTP request headers, we also found some APIs that carry the region hint in the request body.

In summary, the routing at the edge has to deal with three types of requests:

The requests that carry a trusted JWT The requests that carry a region hint in the query string, the header, or the body The requests that carry neither a region hint nor a JWT

The following diagram shows what we built at a high level. At the edge (the Compute service) for the above three types of requests, we find the region and then dispatch the request to the corresponding backend. This happens at the Fastly PoP closer to the user, and the Fastly Compute service is aware of all the backends (or the regions) available to dispatch requests.

At the edge (the Compute service) finds the region and then dispatches the request to the corresponding backend. This happens at the Fastly PoP closer to the user, and the Fastly Compute service is aware of all the backends (or the regions) available to dispatch requests.

The Compute service running in Fastly is an edge gateway that intercepts all the requests. This is a module written in Rust, which compiles into a WebAssembly and then runs as a Compute service in Fastly.

In the Compute service code, we have a router that handles a given domain. For example, the api.devrev.ai domain has its router, and the app.devrev.ai domain has its own. Each router defines what needs to be validated in the request path, and a given router is attached to one backend. For example, the router behind the api.devreva.ai domain is attached to the AWS global accelerator for us-east-1 and sets the hostname to api.devrev.ai.

However, this router architecture was not sufficient to support multi-regional routing at the edge. Now, we need to find the backend attached to a router in the runtime based on the request context. In other words, we need to resolve the backend based on the request context. To support this model, we introduced a new construct called, resolver. A resolver is registered against a path, and it knows how to find the corresponding region based on the request context. If there is no resolver, then the default resolver is used, which finds the corresponding region based on a claim in the JWT. The default resolver can only be used for authenticated requests. All the unauthenticated requests (if allowed), are routed to the default region.

In the previous section, we discussed how authenticated requests, which include a JWT, are routed to the backend based on a claim within the token. But who is responsible for adding this claim to the JWT? Let’s unravel that mystery!

As shown in the first diagram, the DevRev web app uses a trusted identity provider, Auth0. Auth0 does not store any user credentials; it simply acts as an identity bus. When a user creates an organization in DevRev, we internally use the Auth0 API to create the corresponding Auth0 organization and update the metadata with the unique organization ID generated by DevRev. This unique organization ID includes the corresponding regional identifier.

When a user logs into their DevRev organization via Auth0 (after being federated through their identity provider or a social identity provider), an action within Auth0 reads the corresponding organization ID and adds it to the token it generates. The Compute service at the edge then reads this claim, determines the corresponding region, and dispatches the request accordingly.

By implementing multi-regional routing at the edge, we ensure that routing decisions are made as close to the user as possible, directly within the request path, and in a transparent manner. Neither the client nor the user needs to be aware of the underlying regional architecture. Each region operates independently, without requiring awareness of other regions, which allows us to seamlessly add new regions without affecting the existing ones.

How we built multi-regional routing at the edge @ DevRev was originally published in FACILELOGIN on Medium, where people are continuing the conversation by highlighting and responding to this story.

Thursday, 22. August 2024

Patrick Breyer

Piratenpartei weist Herstellerkritik an Gamer-Bürgerinitiative gegen Computerspiele-Sterben zurück

Zum Auftakt der Branchenmesse Gamescom weist die Piratenpartei die Kritik des Herstellerverbands Video Games Europe an der EU-Bürgerinitiative zum Schutz von Videospielen vor Fernabschaltung durch den Hersteller zurück. Der …

Zum Auftakt der Branchenmesse Gamescom weist die Piratenpartei die Kritik des Herstellerverbands Video Games Europe an der EU-Bürgerinitiative zum Schutz von Videospielen vor Fernabschaltung durch den Hersteller zurück. Der Industrieverband bezeichnet den geforderten Weiterbetrieb vom Hersteller aufgegebener Spiele durch die Community als “unrealistisch” wegen der Kosten für Unternehmen und weil private Serverbetreiber Datenschutz und Sicherheit nicht gewährleisten könnten.

Dr. Patrick Breyer, ehemaliger Europaabgeordneter der Piratenpartei, hält dagegen:

Hersteller und EU-Kommission müssen lernen, dass Videospiele keine Konserven sind, die man einfach aus dem Regal nehmen und durch ein anderes Produkt ersetzen kann. Sie sind Teil unseres gemeinsamen digitalen Erbes und verbinden Menschen über Jahre und manchmal sogar Jahrzehnte hinweg. Deswegen ist ein Weiterbetrieb durch die Community nur folgerichtig.

Es gibt längst Spiele, die erfolgreich ehrenamtlich betrieben oder weiterbetrieben werden. Der Datenschutz gilt auch für private ehrenamtliche Betreiber. Gamer, die in eine Übertragung ihrer Daten einwilligen und über die neuen Rahmenbedingungen aufgeklärt werden, sollten nicht wegen möglicher Abstriche bevormundet und ganz ihres Lieblingsgames beraubt werden, sondern eine Wahl bekommen. Die aktuelle Situation, in der Hersteller sich gegenüber voll zahlenden Gamern die willkürliche kurzfristige Abschaltung von einem Tag auf den anderen vorbehalten, ist untragbar.

Die Bürgerinitiative richtet sich zwar speziell gegen das willkürliche Abschalten von Computerspielen durch die Hersteller, aber diese Praxis betrifft längst nicht mehr nur Spiele. Auch andere Software und digitale Geräte bis hin zu Zügen werden fernabgeschaltet. Wir Piraten wollen diesen Rechtsmissbrauch stoppen und gesetzlich den Weiterbetrieb sowie die Weiterentwicklung verwaister Software durch die Community ermöglichen. Dies ist auch im Interesse der Nachhaltigkeit.

Die Piratenpartei setzt sich mit Nachdruck für die europäische Bürgerinitiative „Stop Killing Videogames“ ein und ruft zur Unterzeichnung auf. Die Initiative zielt darauf ab, sicherzustellen, dass Videospiele nicht unspielbar werden, wenn die Unterstützung der Publisher endet. Sie hat kurz nach dem Start bereits etwa 300.000 von 1 Mio. erforderlichen Unterschriften erreicht.

„Initiativen wie ‚Stop Killing Videogames‘ sind essenziell, um die Rechte der Spielenden zu schützen und die Vielfalt der Gaming-Kultur zu bewahren. Wir Piraten fordern, dass vermeintlich nicht lukrative Spiele nicht deaktiviert, sondern der Öffentlichkeit als Open Source zur Verfügung gestellt werden und beispielsweise private Server ermöglicht werden“, erklärt Lukas Küffner, Bundesvorsitzender der Piratenpartei Deutschland.

Auf Anfrage Breyers hatte die EU-Kommission bestätigt, dass es bisher keinen Schutz vor willkürlicher Abschaltung durch Hersteller gibt.

Weitere Informationen zur Bürgerinitiative:
Info-Website
Aufzeichnung Piraten-Webinar
Piraten-Infos

Tuesday, 20. August 2024

Jon Udell

Building a Steampipe Plugin — and Powerpipe Dashboards — for WordPress

“Users of the WordPress API may enjoy the abstraction — and standardization — that a SQL interface provides. If you need to query multiple WordPress sites, Steampipe’s connection aggregator will be really handy. And if you want to integrate data from WordPress with data from other APIs wrapped by other plugins in the Steampipe hub, … Continue reading Building a Steampipe Plugin — and Powerpipe Dash

“Users of the WordPress API may enjoy the abstraction — and standardization — that a SQL interface provides. If you need to query multiple WordPress sites, Steampipe’s connection aggregator will be really handy. And if you want to integrate data from WordPress with data from other APIs wrapped by other plugins in the Steampipe hub, performing literal SQL JOINs across disparate APIs is a heady experience.” — Building a Steampipe Plugin — and Powerpipe Dashboards — for WordPress

Part of the LLM series at The New Stack.

Thursday, 15. August 2024

Heres Tom with the Weather

An error occurred

Classical pianist Jayson Gillham was pulled from an upcoming performance in Melbourne after dedicating a piece of music to the journalists of Gaza. The Melbourne Symphony Orchestra now says the cancellation was due to an error and they appear to be rescheduling the concert.

Classical pianist Jayson Gillham was pulled from an upcoming performance in Melbourne after dedicating a piece of music to the journalists of Gaza. The Melbourne Symphony Orchestra now says the cancellation was due to an error and they appear to be rescheduling the concert.

Monday, 12. August 2024

Damien Bod

Add a Swagger UI using a .NET 9 Json OpenAPI file

This post shows how to implement a Swagger UI using a .NET 9 produced OpenAPI file. The Swagger UI is deployed to a secure or development environment and is not deployed to a public production target. Sometimes, it is required to deploy the Swagger UI to a development deployment target and not the test or … … Continue reading →

This post shows how to implement a Swagger UI using a .NET 9 produced OpenAPI file. The Swagger UI is deployed to a secure or development environment and is not deployed to a public production target. Sometimes, it is required to deploy the Swagger UI to a development deployment target and not the test or the production deployments. The security headers need to be weakened to allow the Swagger UI to work.

Code: https://github.com/damienbod/WebApiOpenApi

Setup

The post uses the OpenAPI Json created by a .NET 9 ASP.NET Core application. See this blog:

Implementing an ASP.NET Core API with .NET 9 and OpenAPI

Two further packages were added to this project, one for the generation of the Swagger UI and the second package to generate the required security headers for an API using JWT Bearer tokens.

Swashbuckle.AspNetCore.SwaggerUI NetEscapades.AspNetCore.SecurityHeaders

The ASP.NET Core API application has already generated the OpenAPI definitions in a Json file. The Json can be used to create the UI. If the application is deployed to production, the Json file is not created and the security headers are deployed with the most restrictions. If the application is deployed for development, the Json is deployed and the security headers are weakened to allow this to work.

// Open up security restrictions to allow this to work // Not recommended in production //var deploySwaggerUI = app.Environment.IsDevelopment(); var deploySwaggerUI = app.Configuration.GetValue<bool>("DeploySwaggerUI"); app.UseSecurityHeaders( SecurityHeadersDefinitions.GetHeaderPolicyCollection(deploySwaggerUI)); // ... other middleware app.MapOpenApi("/openapi/v1/openapi.json"); if (deploySwaggerUI) { app.UseSwaggerUI(options => { options.SwaggerEndpoint("/openapi/v1/openapi.json", "v1"); }); }

The DeploySwaggerUI configuration is used to specify if the deployed version supports both a UI and an API or just an API with the most restrictive security settings.

{ // Open up security restrictions to allow this to work // Not recommended in production "DeploySwaggerUI": true,

Setup security headers

The security headers are setup so that if the deployment is for development, scripts and styles are allowed. The configuration allowing scripts is weak and not recommended for production.

namespace WebApiOpenApi; public static class SecurityHeadersDefinitions { public static HeaderPolicyCollection GetHeaderPolicyCollection(bool isDev) { var policy = new HeaderPolicyCollection() .AddFrameOptionsDeny() .AddContentTypeOptionsNoSniff() .AddReferrerPolicyStrictOriginWhenCrossOrigin() .AddCrossOriginOpenerPolicy(builder => builder.SameOrigin()) .AddCrossOriginEmbedderPolicy(builder => builder.RequireCorp()) .AddCrossOriginResourcePolicy(builder => builder.SameOrigin()) .RemoveServerHeader() .AddPermissionsPolicy(builder => { builder.AddAccelerometer().None(); builder.AddAutoplay().None(); builder.AddCamera().None(); builder.AddEncryptedMedia().None(); builder.AddFullscreen().All(); builder.AddGeolocation().None(); builder.AddGyroscope().None(); builder.AddMagnetometer().None(); builder.AddMicrophone().None(); builder.AddMidi().None(); builder.AddPayment().None(); builder.AddPictureInPicture().None(); builder.AddSyncXHR().None(); builder.AddUsb().None(); }); AddCspHstsDefinitions(isDev, policy); policy.ApplyDocumentHeadersToAllResponses(); return policy; } private static void AddCspHstsDefinitions(bool isDev, HeaderPolicyCollection policy) { if (!isDev) { policy.AddContentSecurityPolicy(builder => { builder.AddObjectSrc().None(); builder.AddBlockAllMixedContent(); builder.AddImgSrc().None(); builder.AddFormAction().None(); builder.AddFontSrc().None(); builder.AddStyleSrc().None(); builder.AddScriptSrc().None(); builder.AddBaseUri().Self(); builder.AddFrameAncestors().None(); builder.AddCustomDirective("require-trusted-types-for", "'script'"); }); // maxage = one year in seconds policy.AddStrictTransportSecurityMaxAgeIncludeSubDomains(maxAgeInSeconds: 60 * 60 * 24 * 365); } else { // allow swagger UI for dev policy.AddContentSecurityPolicy(builder => { builder.AddObjectSrc().None(); builder.AddBlockAllMixedContent(); builder.AddImgSrc().Self().From("data:"); builder.AddFormAction().Self(); builder.AddFontSrc().Self(); builder.AddStyleSrc().Self().UnsafeInline(); builder.AddScriptSrc().Self().UnsafeInline(); //.WithNonce(); builder.AddBaseUri().Self(); builder.AddFrameAncestors().None(); }); } } }

When the UI is deployed to in development mode, the Swagger UI is displayed and the user can enter a valid access token and use the APIs.

Notes

At present, the Swagger UI does not support script best practices and allowing this to work means deploying an unsecure web application with weak security. Any UI web application should use a strong CSP definition, for example like CSP nonces. An API has no UI and so should be locked down.

Links

https://github.com/martincostello/dotnet-minimal-api-integration-testing

https://learn.microsoft.com/en-us/aspnet/core/fundamentals/minimal-apis/aspnetcore-openapi

https://learn.microsoft.com/en-us/aspnet/core/web-api/action-return-types

https://github.com/RicoSuter/NSwag

https://swagger.io/

https://swagger.io/specification/

Saturday, 10. August 2024

Jon Udell

How LLMs Guide Us to a Happy Path for Configuration and Coding

“Some argue that by aggregating knowledge drawn from human experience, LLMs aren’t sources of creativity, as the moniker “generative” implies, but rather purveyors of mediocrity. Yes and no. There really are very few genuinely novel ideas and methods, and I don’t expect LLMs to produce them. Most creative acts, though, entail novel recombinations of known … Continue reading How LLMs Guide Us to a H

“Some argue that by aggregating knowledge drawn from human experience, LLMs aren’t sources of creativity, as the moniker “generative” implies, but rather purveyors of mediocrity. Yes and no. There really are very few genuinely novel ideas and methods, and I don’t expect LLMs to produce them. Most creative acts, though, entail novel recombinations of known ideas and methods. Because LLMs radically boost our ability to do that, they are amplifiers of — not threats to — human creativity.” – How LLMs Guide Us to a Happy Path for Configuration and Coding

Part of the LLM series at The New Stack.

Friday, 09. August 2024

reb00ted

Notes from our DWebCamp session imagining the future of open social networking

At this year’s, DWebCamp, my friend and collaborator Kaliya aka IdentityWoman and I ran a session titled: “Envisioning the future: what if a single, decentralized, interoperable, open social network indeed emerges and wins over the proprietary platforms?” We did something that, at least for me, was new. We started with: Assume it has happened. We have won. Like on page 1 of a sci-fi story

At this year’s, DWebCamp, my friend and collaborator Kaliya aka IdentityWoman and I ran a session titled:

“Envisioning the future: what if a single, decentralized, interoperable, open social network indeed emerges and wins over the proprietary platforms?”

We did something that, at least for me, was new. We started with:

Assume it has happened. We have won. Like on page 1 of a sci-fi story, you wake up in the morning one day and the world you want is has suddenly arrived. At least 4 billion people interact with each other socially on an open, decentralized social network. All of the proprietary platforms have much smaller user numbers or have disappeared entirely; they have stopped mattering. Out of scope: how we get there, or which specific products/standards/technologies are being used.

We asked the question: “Describe what you see”. People who came to our session had a few minutes to meditate on that question, and then wrote down their thoughts, one at a time, on some stickies. They then announced what they wrote and put it on a big sheet of paper. Below is the electronic version of this paper (minus some items which I could not manage to decipher when transcribing):

Community focused and developed by the community Hate free zone Accessible resources (open source) for development by anyone Message flows automatically (I don’t have to decide who are the recipients) biofi.earth for decentralized technology Not harmfully powered Access to all medical data. Do I want all measurements? Which doctor may see what? How do I manage that? Serependipity / surprise me Anti-filter bubble Multi-modal “of one universe” Everything is forkable (in the open-source sense) Everything is fully transparent Attention management / needs to manage Personal and collective sense-making bubble Stigmergy / people marking How to make the network safe? Non-screen-centric “Netflix”-like subscription for all: distributed to builders / operators New funding models for open source Cooperative Attention Economy Fed by opt-in-able data sets Ability to opt-in/opt-out Personal digital exoskeleton / voice-centric Not one systems. Highly fragmented (needs are specific) A shared collaborative universe, not just “apps” Social “digital twin” objects Who you are is what you consume: food and information Photo gallery If it isn’t worth looking at for at least 10 seconds, it isn’t worth any time at all Tap to dive in, call, play, study / zoom out to explore and connect Non-repetitive Profile: Skills Expanse Reviews Map Faves Music Corporate news is dead, replaced by emergent editorial boards Threats community mapping / fear

What surprised me most is that there were so many comments that wanted to have a single “social universe” into which various apps and screens and voice are “windows”", rather than a single app. That’s very close to the original vision for the world-wide-web, and one that we’ve largely lost with all the apps we are using every day. But perhaps we can get it back?

Addition 2024-08-17: two more, which I don’t know how to transcribe.

Thursday, 08. August 2024

Just a Theory

Release: pgxn_meta v0.1.0

Today I released pgxn_meta v0.1.0, a Rust crate and executable for validating PGXN Meta v1 and v2 META.json files.

Following the recent spate of work drafting RFCs for a binary distribution format and Meta Spec v2, and bearing in mind the vote to implementing PGXN v2 apps in Rust, over the last few weeks I’ve switched gears to write some code.

First I wrote JSON Schemas for the v1 spec and then wrote a full test suite using the boon crate. Next I wrote and tested JSON Schemas for the v2 spec. This process informed revisions to the RFC, uncovering inconsistencies and unnecessary complexities.

With META.json file JSON Schema validation fully worked up, I decided to work the functionality into a proper Rust crate and CLI to eventually replace the old PGXN::Meta::Validator Perl module and its validate_pgxn_meta CLI. This turned out to be a decent Rust starter project, requiring a fairly simple crate and CLI, but also allowed me to develop patterns to build and release binaries for a variety of platforms and architecture.

As a result, I’m happy to announce the release today of the pgxn_meta crate and pgxn_meta CLI v0.1.0, available for download on these platforms, thanks to cross and houseabsolute/actions-rust-cross:

darwin-amd64 darwin-arm64 freebsd-amd64 freebsd-i686 illumos-amd64 linux-amd64 linux-arm linux-arm64 linux-i686 linux-powerpc linux-powerpc64 linux-powerpc64le linux-riscv64 linux-s390x linux-sparc64 netbsd-amd64 solaris-amd64 solaris-sparcv9 windows-amd64 windows-arm64 windows-i686

Download the archive file appropriate to your platform, decompress it, and put the pgxn_meta (or pgxn_meta.exe) binary in your path. Or use the universal binary installer (ubi) to install it:

ubi --project pgxn/meta --exe pgxn_meta --in ~/bin

And of course you can use cargo to compile it from source:

cargo install pgxn_meta

Usage is simple: just run pgxn_meta in a directory containing the META.json file to validate:

❯ pgxn_meta META.json is OK

And optionally pass it the name of the file, as in this example parsing a test file with no version property:

❯ pgxn_meta corpus/invalid.json Error: "corpus/invalid.json jsonschema validation failed with https://pgxn.org/meta/v2/distribution.schema.json#\n- at '': missing properties 'version'"```

That’s it!

What’s Next?

Now that I’ve implemented validation and figured out multi-platform binary support for Rust apps, my next tasks are to:

Implement a pattern to convert a v1 META.json to the v2 format Create a pattern to merge multiple META.json files into one Write code to build PGXS extension into trunk packages Develop patterns to satisfy third-party dependencies for multiple platforms

Should keep me busy for a few weeks. Updates as I have them.

More about… Postgres PGXN JSON Schema Rust Metadata

Wednesday, 07. August 2024

Just a Theory

To Preload, or Not to Preload

When should a Postgres extension be pre-loaded and when should it not? Should it be loaded in user sessions or at server startup? For the Tembo blog, I dug into this question and wrote up my findings.

The Tembo Blog published a post by yours truly last week about when to preload shared libraries and when not to:

Recently I’ve been trying to figure out when a Postgres extension shared libraries should be preloaded. By “shared libraries” I mean libraries provided or used by Postgres extensions, whether LOADable libraries or CREATE EXTENSION libraries written in C or pgrx. By “preloaded” I mean under what conditions should they be added to one of the Shared Library Preloading variables, especially shared_preload_libraries.

The answer, it turns out, comes very much down to the extension type.

I view this post as a kind of proto-chapter for an imagined book about developing extensions that I’d like to work on someday. I learned quite a lot researching it and responding to extensive feedback from more knowledgeable community members. It resulted in updates to the PGXN Meta preload property that I hope will inform binary distribution in the future. More on that soon.

More about… PGXN Postgres Extensions Preload Extensions Book

Tuesday, 06. August 2024

Damien Bod

Implementing an ASP.NET Core API with .NET 9 and OpenAPI

This post implements a basic ASP.NET Core API using .NET 9 and the Microsoft OpenAPI implementation. The OpenAPI Nuget package supports both Controller based APIs and minimal APIs. Until now, we used excellent solutions like NSwag to produce the API schemas which can be used to auto-generate client code. Code: https://github.com/damienbod/WebApiOpenApi Setup A .NET 9 … … Continue reading →

This post implements a basic ASP.NET Core API using .NET 9 and the Microsoft OpenAPI implementation. The OpenAPI Nuget package supports both Controller based APIs and minimal APIs. Until now, we used excellent solutions like NSwag to produce the API schemas which can be used to auto-generate client code.

Code: https://github.com/damienbod/WebApiOpenApi

Setup

A .NET 9 project can be created using the .NET Web API templates. The required SDKs or Visual Studio version needs to be installed. The OpenAPI Json definitions can be created using the Microsoft.AspNetCore.OpenApi Nuget package. Microsoft learn docs have all the different possibilities for implementing this depending on your preferred development environment. Here’s an example using powershell:

Install-Package Microsoft.AspNetCore.OpenApi -IncludePrerelease

JWT OAuth authorization is used in this example for the API. I installed the following packages:

Microsoft.AspNetCore.OpenApi Microsoft.AspNetCore.Authentication.JwtBearer Microsoft.Extensions.ApiDescription.Server

The OpenAPI services are added to the project. The API uses JWT Bearer and OAuth and so the OpenAPI definitions should also add this. An BearerSecuritySchemeTransformer implementation was created like in the Microsoft documentation.

builder.Services.AddOpenApi(options => { options.AddDocumentTransformer<BearerSecuritySchemeTransformer>(); });

The BearerSecuritySchemeTransformer class implements the OpenAPI specific definitions. If using cookies, only the document.Info is required and this can be added directly in the services without an extra class. From the Microsoft docs:

internal sealed class BearerSecuritySchemeTransformer(IAuthenticationSchemeProvider authenticationSchemeProvider) : IOpenApiDocumentTransformer { public async Task TransformAsync(OpenApiDocument document, OpenApiDocumentTransformerContext context, CancellationToken cancellationToken) { var authenticationSchemes = await authenticationSchemeProvider.GetAllSchemesAsync(); if (authenticationSchemes.Any(authScheme => authScheme.Name == "Bearer")) { var requirements = new Dictionary<string, OpenApiSecurityScheme> { ["Bearer"] = new OpenApiSecurityScheme { Type = SecuritySchemeType.Http, Scheme = "bearer", // "bearer" refers to the header name here In = ParameterLocation.Header, BearerFormat = "Json Web Token" } }; document.Components ??= new OpenApiComponents(); document.Components.SecuritySchemes = requirements; } document.Info = new() { Title = "My API Bearer scheme", Version = "v1", Description = "API for Damien" }; } }

The middleware needs to be added to the pipeline. You should only add the OpenAPI in development mode unless you require this in production.

app.MapOpenApi("/openapi/v1/openapi.json");

ASP.NET Core supports two types of API implementations; Controller based and minimal APIs. Controller based APIs are used in this project. This API endpoint requires a valid access token and have the different Endpoint definitions.

[Authorize(AuthenticationSchemes = "Bearer")] [ApiController] [Route("[controller]")] public class WeatherForecastController( ILogger<WeatherForecastController> _logger) : ControllerBase { [EndpointSummary("This is a summary from OpenApi attributes.")] [EndpointDescription("This is a description from OpenApi attributes.")] [Produces(typeof(IEnumerable<WeatherForecast>))] [HttpGet("GetWeatherForecast")] public IActionResult Get() { _logger.LogDebug("GetWeatherForecast with OpenAPI definitions"); return Ok(Enumerable.Range(1, 5).Select(index => new WeatherForecast { Date = DateOnly.FromDateTime(DateTime.Now.AddDays(index)), TemperatureC = Random.Shared.Next(-20, 55), Summary = Summaries[Random.Shared.Next(Summaries.Length)] }) .ToArray()); }

You can also add definitions for POSTs and parameters or whatever you require.

[EndpointSummary("This is a second summary from OpenApi attributes.")] [EndpointDescription("This is a second description from OpenApi attributes.")] [Produces(typeof(IEnumerable<WeatherForecast>))] [HttpPost("PostWeatherForecast")] public IActionResult PostWeatherForecast( [Description("parameter post item using OpenApi")] WeatherForecast weatherForecast) { return Ok(weatherForecast); }

When the application is started, the Json OpenAPI definitions can be downloaded and used.

Notes

I am not sure how good the OpenAPI specifications are and need to validate how to define the different status codes in the specifications with the correct return types.

Links

https://github.com/martincostello/dotnet-minimal-api-integration-testing

https://learn.microsoft.com/en-us/aspnet/core/fundamentals/minimal-apis/aspnetcore-openapi

https://learn.microsoft.com/en-us/aspnet/core/web-api/action-return-types

https://github.com/RicoSuter/NSwag

Saturday, 03. August 2024

Randall Degges

I'm Writing Again

I can’t believe it’s been over two years since I last published an article here! Time really flies. The crazy part is that over these last two years, I’ve done more writing than ever; it just hasn’t been public. Several years back, I started a journaling habit, and since then, most of my writing energy has been redirected from my website to my private journal. While writing in private fee

I can’t believe it’s been over two years since I last published an article here! Time really flies.

The crazy part is that over these last two years, I’ve done more writing than ever; it just hasn’t been public.

Several years back, I started a journaling habit, and since then, most of my writing energy has been redirected from my website to my private journal.

While writing in private feels liberating, as I can dump every little thought in my head into an archive, I find myself missing the more structured and thought-out articles I used to push myself to publish.

There’s something magical about writing down your thoughts and sharing them with the world, even if nobody sees them. It forces you to rethink your position, clarify your thoughts, and distill the mess in your head into something direct and actionable.

Over the last couple of years, I have missed that feeling, and the itch to continue writing has returned.

All this is simply to say that I’m writing again. The title of my website has always been “Random Thoughts of a Happy Programmer,” so please stay tuned for more of my innermost thoughts and experiments.


Mike Jones: self-issued

Fully-Specified Algorithms Specification Addressing Feedback from IETF 120

Orie Steele and I have updated the “Fully-Specified Algorithms for JOSE and COSE” specification to incorporate feedback from IETF 120 in Vancouver. Specifically, the registrations for fully-specified Elliptic Curve Diffie-Hellman (ECDH) algorithms in draft 03 were removed, along with the previously proposed fully-specified ECDH algorithm identifiers, while continuing to describe how to create fully

Orie Steele and I have updated the “Fully-Specified Algorithms for JOSE and COSE” specification to incorporate feedback from IETF 120 in Vancouver. Specifically, the registrations for fully-specified Elliptic Curve Diffie-Hellman (ECDH) algorithms in draft 03 were removed, along with the previously proposed fully-specified ECDH algorithm identifiers, while continuing to describe how to create fully-specified ECDH algorithms in the future, if needed.

The specification is available at:

https://www.ietf.org/archive/id/draft-ietf-jose-fully-specified-algorithms-04.html

Tuesday, 30. July 2024

Jon Udell

When not to LLM

Here’s the latest installment in the series on working with LLMS: https://thenewstack.io/choosing-when-to-use-or-not-use-llms-as-a-developer/ For certain things, the LLM is a clear win. If I’m looking at an invalid blob of JSON that won’t even parse, there’s no reason to avoid augmentation. My brain isn’t a fuzzy parser — I’m just not wired to see that kind … Continue reading When not to LLM

Here’s the latest installment in the series on working with LLMS: https://thenewstack.io/choosing-when-to-use-or-not-use-llms-as-a-developer/

For certain things, the LLM is a clear win. If I’m looking at an invalid blob of JSON that won’t even parse, there’s no reason to avoid augmentation. My brain isn’t a fuzzy parser — I’m just not wired to see that kind of problem, and that isn’t likely to change with effort and practice. But if there are structural problems with code, I need to think about them before reaching for assistance.

The rest of the series:

1 When the rubber duck talks back

2 Radical just-in-time learning

3 Why LLM-assisted table transformation is a big deal

4 Using LLM-Assisted Coding to Write a Custom Template Function

5 Elevating the Conversation with LLM Assistants

6 How Large Language Models Assisted a Website Makeover

7 Should LLMs Write Marketing Copy?

8 Test-Driven Development with LLMs: Never Trust, Always Verify

9 Learning While Coding: How LLMs Teach You Implicitly

10 How LLMs Helped Me Build an ODBC Plugin for Steampipe

11 How to Use LLMs for Dynamic Documentation

12 Let’s talk: conversational software development

13 Using LLMs to Improve SQL Queries

14 Puzzling over the Postgres Query Planner with LLMs

15 7 Guiding Principles for Working with LLMs

16 Learn by Doing: How LLMs Should Reshape Education

17 How to Learn Unfamiliar Software Tools with ChatGPT

18 Creating a GPT Assistant That Writes Pipeline Tests

19 Using AI to Improve Bad Business Writing

20 Code in Context: How AI Can Help Improve Our Documentation

21 The Future of SQL: Conversational Hands-on Problem Solving

22 Pairing With AI: A Senior Developer’s Journey Building a Plugin

23 How LLMs Can Unite Analog Event Promotion and Digital Calendars

24 Using LLMs to Help Write a Postgres Function

25 Human Insight + LLM Grunt Work = Creative Publishing Solution

Monday, 29. July 2024

Kerri Lemole

Do We Need Blockchain for LERs? No.

When explaining the W3C Verifiable Credential Data Model (VC), Open Badges 3.0, and LERs, I’m often asked about the role of blockchain and if I’m not asked, I feel compelled to explain because misconceptions and confusion are common. Blockchain is one of those technologies that is at once fascinating and cringey, useful and over-used. The business models are still experimental and when it comes do

When explaining the W3C Verifiable Credential Data Model (VC), Open Badges 3.0, and LERs, I’m often asked about the role of blockchain and if I’m not asked, I feel compelled to explain because misconceptions and confusion are common. Blockchain is one of those technologies that is at once fascinating and cringey, useful and over-used. The business models are still experimental and when it comes down to it, there are simpler, more efficient, and more affordable methods for issuing data rich portable, verifiable digital credentials that meet the needs of issuers, give learners and earners easy access to their data, and a private way to prove that their digitally-signed achievements are theirs for as long as they need them.

You could say that I’ve spent some time thinking about this. In 2015, I co-founded a group called BadgeChain with other Open Badges community members that explored how Open Badges could be more verifiable and have clearer provenance if stored on blockchain. This coincided with Philipp Schmidt and Kim Hamilton Duffy’s leadership on Blockcerts, an open specification for issuing Open Badges on Bitcoin and Ethereum. Learning Machine continued with this work, which was later acquired by Hyland. In 2020 I was the lead researcher on the Education and Blockchain initiative with the American Council on Education and the US Department of Education where we reported on the potential uses of blockchain and then supported several projects that used blockchain or blockchain-like technology for education and training credentials.

I truly believed that we were creating the future and would help society by leveraging what seemed like an obvious use of distributed digital ledgers but what we all learned from this was:

Blockchains are expensive and energy-sucky ways to store and verify credentials It’s a headache to explain to and request public keys from credential recipients Storing data on a blockchain (especially long-living ones like Bitcoin and Ethereum), even when hashed, is risky for PII (Personal Identifiable Information) and means that once it’s on a chain, it can’t be changed or removed. So, if any mistakes are made or data has changed, there’s no way to remove it. It can create a vendor lock for the issuer and the recipient. When the vendor/network goes away, where do those credentials go? It doesn’t free data trapped in centralized silos. It creates new types of silos.

During this time and since, the Credentials Community Group at W3C (CCG) started developing the W3C Verifiable Credentials Data Model (VC). The use cases for this were originally sourced from the members of Open Badges community in 2017. The W3C Verifiable Credentials for Education Task Force was founded soon after by the Digital Credentials Consortium (DCC) and it was this task force which agreed to propose Open Badges 3.0 to 1EdTech in 2021. Now Open Badges 3.0 is final and W3C VC 2.0 is almost complete.

Issuing Open Badges 3.0 is not only about how the achievement is described or the structure of the metadata. The metadata is a critical component to the standard but the bigger upgrade is the alignment to W3C VCs so that digital education credentials like badges, micro-credentials, certifications, and even degrees could be stored, shared, verified, and used on the web in the same secure way as driver’s licenses and citizenship documents that are also issued as W3C VCs. The intention is for all of these credentials to be understandable and verified similarly so that they can be used together.

W3C VCs are cryptographically verifiable, meaning that verfiers can check via software that the data hasn’t changed since it’s been signed, validate the identity of the issuer, and confirm that the credential hasn’t expired or been revoked. They are digitally signed by the issuer and provided to the credential earner. From that point on, that credential belongs to the earner and whatever they do with it is their business only. It may be that issuers ask if earners are willing to tell them where they use their credentials, but inherently, the basic premise of VCs is that they can be used independently. Just because the data is sourced from the issuer does not mean how that data is used should be shared with the issuer if the earners don’t consent. Would you want the DMV to know every time you share your driver’s license? Should the DMV get paid if you use your driver’s license? Even if it was digital? Why should it be any different for education and training credentials?

Admittedly, there are interesting blockchain models to explore that are adjacent to or support W3C VCs and Open Badges 3.0. For example, we’re just starting to touch upon the usefulness of issuer registries and blockchain could be useful there; blockchain-like technologies like Fluree are fueling systems like the Trusted Learner Network (TLN) where distributed nodes of data can be accessed by students and alumni to produce VCs; systems like Ceramic and IPFS can be used for longevity and distribution of evidence and supporting files; some Decentralized identifier methods that use blockchain are being tested around the world.

A model where verifiers pay to access and verify credentials in a tokenized blockchain system compromises the privacy of the credential recipients and traps their credentials in a closed network. In the end, it’s the credential earners that get forced into a closed system where others could profit from their credentials that represent the investment of their time, work, and money. Also, what’s the incentive for verifiers to pay tokens for access to credentials when they could evaluate and accept other credentials on the web for free?

That is the opposite of the intention of W3C Verifiable Credentials and Open Badges 3.0. At the DCC along with many others, we promote the use of W3C VCs as distinct entities of digitally signed achievements that are in the hands of those who earn them, who choose how to use them, and who or what can have access to them. We need to make it easier and affordable for verifiers to get onboard and understand how VCs and Open Badges work so that they become commonplace and useful on the web for everyone.

For more information on the DCC, including membership, projects, and events, visit our website, wiki, and Linkedin. To receive our newsletter and invitations to webinars and events sign up for our community mailing list.

Do We Need Blockchain for LERs? No. was originally published in Digital Credentials Consortium on Medium, where people are continuing the conversation by highlighting and responding to this story.


What Are Portable, Verifiable Digital Credentials?

The Digital Credentials Consortium (DCC) is advancing the use and understanding of portable, verifiable digital credentials in higher education through open source technology development and leadership, research, and advocacy. What are portable, verifiable digital credentials and why is it important that we advance their use? Digital credentials at their most basic definition are electronic

The Digital Credentials Consortium (DCC) is advancing the use and understanding of portable, verifiable digital credentials in higher education through open source technology development and leadership, research, and advocacy. What are portable, verifiable digital credentials and why is it important that we advance their use?

Digital credentials at their most basic definition are electronic assets that describe an achievement or a qualification. Credential Engine’s Credential Transparency Description Language Handbook describes a credential as “A qualification, achievement, personal or organizational quality, or aspect of an identity typically used to indicate suitability”. There are many different credential types that can be used in education, training, and industry as learning and employment records (LERs) to describe achievements and provide details on related skills and competencies.

When referring to credentials as “portable” and “verifiable” we are indicating that they have been created in alignment with the key web standards that are used to issue digitally signed, metadata rich digital credentials. The W3C Verifiable Credentials (VC) standard is the web standard for digital signatures and proving claims made on the web. Open Badges 3.0 is a standard conformant with W3C Verifiable Credentials that is specifically tailored to issue education and experience digital credentials. Both standards result in distinct electronic assets that can be held by individuals in wallets or other storage to be used later.

Through its participation in community standards organizations including the W3C and 1Edtech, the DCC has helped to develop these standards and is advancing their use and understanding so that individuals may have more access to their data and have the agency to freely and privately share and use their credentials. To accomplish this, the DCC is advocating that issuers of digital credentials use software and vendors that prioritize the use of the LER VC standards. Our members are issuing micro-credentials and even degrees as Open Badges and W3C Verifiable Credentials so that their students and alumni can have digitally signed, data rich markers of achievement that describe what they know and can do that are usable anywhere on the web (and in the world) for their lifetimes.

For more information on the DCC, including membership, projects, and events, visit our website, wiki, and Linkedin. To receive our newsletter and invitations to webinars and events sign up for our community mailing list.

What Are Portable, Verifiable Digital Credentials? was originally published in Digital Credentials Consortium on Medium, where people are continuing the conversation by highlighting and responding to this story.

Sunday, 28. July 2024

Jon Udell

How and why to write letters to voters

If you don’t live in a swing state, but would like to do more than just send money to help encourage voter turnout in those places, what are your options? For me the best one is Vote Forward, which orchestrates letter-writing to registered voters. I sent hundreds of such letters in 2020 and am aiming … Continue reading How and why to write letters to voters

If you don’t live in a swing state, but would like to do more than just send money to help encourage voter turnout in those places, what are your options? For me the best one is Vote Forward, which orchestrates letter-writing to registered voters. I sent hundreds of such letters in 2020 and am aiming to do lots more, with help from friends, this time around.

Even if I lived in a swing state, I’m not someone who’d be comfortable knocking on doors. And the last thing I want to do is pester people in those places with yet another unwanted phone call or text message. The Vote Forward method is perfect for me personally, and I also think it’s the most clever and sensible way to encourage voters in other states. Here’s how it works.

You “adopt” voters in batches of 5 or 20. I just adopted my first 100: 20 in each of Ohio, Pennsylvania, Michigan, New Hampshire, and North Carolina. You download each batch as a PDF that prints 21 pages. Page one has the instructions and the list of registered voters’ names and addresses

The fact that you write the letters (and address the envelopes) by hand is a great idea. We receive very few hand-addressed letters nowadays, I think they have a pretty good chance of being opened. And once opened, the hand-written message is again unusual. The fact that somebody made the effort to do that signals a rare kind of authenticity.

Likewise, I think the nonpartisan tone of the message is unusual and conveys authenticity. I wish voting were mandatory in the US, as it is in Australia and elsewhere. However the chips fall in November, I would like to know that the result truly reflects what everyone thinks. My message last time was something like:

“… because it’s not really a democracy unless everyone’s voice is heard.”

Pages 2-21 are the letter templates. They look like this:

The hardest part for me was the handwriting. I famously struggled with cursive writing in fifth grade. By the time I reached high school I had reverted to printing. Then, in college, I realized that cursive is more efficient and relearned how to do it. I had to relearn all over again in 2020 because cursive was the fastest way to write all those letters. And I’ll probably have to relearn again this time. I suspect many in younger generations never learned cursive at all, in which case writing the letters by hand will be even harder. So: keep the message short!

If you’ve received a link to this post directly from me, it’ll come with an invitation to drop by our house, hang out on the porch, and help me complete batches of these letters. Otherwise, I hope you might try this method yourself, and/or share it with others. In the past week I’ve switched from doomscrolling to hopescrolling and that’s a huge relief. But I also want to do something tangible (again, beyond donations) and this will be my focus. It feels good to do the work, and will feel really good when I visit the post office sometime in October and drop off a big stack of hand-addressed envelopes.

But is it effective? That’s another thing I like about Vote Forward. They’ve made a sincere effort to measure the impact. And they are honest about the findings: the measurable effect is small. I’ll give them the last word here.

Why should we get excited about small differences?

Because getting people who don’t vote to show up at the polls (or mail in a ballot) is actually pretty hard. Most of the factors that affect whether people vote are tied to big, structural issues (like voter ID laws or polling place accessibility) or deep-seated attitudes (e.g., a lack of faith that elections matter). Given these obstacles, boosting turnout by even a small amount is a real achievement! And, when it comes to politics, we know that many races are decided by tight margins, so a small boost in turnout can translate into a meaningful difference in electoral outcomes.

Thursday, 25. July 2024

Mike Jones: self-issued

Fourth and Likely Last Implementer’s Draft of OpenID Federation Specification

The OpenID Foundation has approved the Fourth Implementer’s Draft of the OpenID Federation Specification. This is a major step towards having the specification become final. The previous Implementer’s Draft was in 2021. A lot has happened since then, largely motivated by feedback from actual implementations and deployments. Some highlights of progress made in the spec […]

The OpenID Foundation has approved the Fourth Implementer’s Draft of the OpenID Federation Specification. This is a major step towards having the specification become final.

The previous Implementer’s Draft was in 2021. A lot has happened since then, largely motivated by feedback from actual implementations and deployments. Some highlights of progress made in the spec since then are:

Changed name from OpenID Connect Federation to OpenID Federation, since Federation can be used for trust establishment for any protocol (including OpenID Connect). Introduced distinct Federation endpoints. Clearly defined and consistently used the terms Entity Statement, Entity Configuration, and Subordinate Statement. Clearly defined which claims can occur in which kinds of Entity Statements. Clearly defined Entity Types and the Federation Entity entity type. Enhanced description of Trust Mark issuance and usage. Defined relationship between metadata and metadata policy. Clearly defined interactions between policy operators. Defined where constraints may occur. Tightened descriptions of Automatic Registration and Explicit Registration. Added Historical Keys. Defined and used trust_chain JWS Header Parameter. Allowed Trust Chains to start with non-Trust Anchors. Clarified use of client authentication. Used OAuth Protected Resource Metadata. Consistent error handling. Added General-Purpose JWT Claims section. Comprehensive use of content types and media types. IANA registration of parameters, claims, and media types. Added and improved many diagrams. Substantial rewrites for increased consistency and clarity. Added Giuseppe De Marco and Vladimir Dzhuvinov as editors.

As a preview of coming attractions, I’ll note that profiles of OpenID Federation are being written describing how it being used in wallet ecosystems and how it is being used in open finance ecosystems. And we’re creating a list of implementations. Watch this space for future announcements.

Special thanks to all the implementers and deployers who provided feedback to get us to this point!

Wednesday, 24. July 2024

Altmode

Kimberley Trip, Day 21: Perth to Home

Wednesday, June 12, 2024 This being our last day in Perth, we had a leisurely start, although that was limited by the need to check out of our hotel room on time. We had breakfast in the hotel, finished packing, and checked our bags for the day at the front desk. We began with a […]

Wednesday, June 12, 2024

This being our last day in Perth, we had a leisurely start, although that was limited by the need to check out of our hotel room on time. We had breakfast in the hotel, finished packing, and checked our bags for the day at the front desk.

Entrance to WA Museum Boola Bardip

We began with a visit to the WA Museum Boola Bardip (Boola Bardip means Many Stories in the local indigenous language), which is located in an area designated as Perth’s cultural center, with theaters and performance venues as well as the Western Australia library and archives. The extensive museum has exhibit halls dedicated to wildlife, cultural, and historical themes. The exhibits, many of them interactive, were well-designed and informative.

It was lunchtime after that, and Kenna remembered a “healthy burgers” chain, Grill’d, that we tried and liked during last spring’s trip to Brisbane. We split a lamb burger, which was a nice change of pace.

We stopped by the Perth Town Hall but couldn’t see much because of an event. The Town Hall is the second-oldest building in Perth, so the receptionist suggested we check out the oldest building, the Old Court House Law Museum. The main room of the building was a small courtroom set up as it would have been in Perth’s early days. The museum also had several exhibits about the evolution of laws applied to Aboriginal people and women, which showed how much things have changed since the beginning of the 20th century and how far we still need to go.

Perth Mint

We continued on to the Perth Mint, where we took a short self-guided tour that explained the Mint’s history and how it related to the Australian gold mining industry. It was fun talking with one of the “coin geeks” there. Today, I learned that successive British monarchs face opposite directions when depicted on coins. Queen Elizabeth II faces right, but King Charles faces to the left when he appears on Australian coins.

We then went to Elizabeth Quay for a break and a small snack and watched our final Perth sunset of the trip from there. We walked back to the hotel, collected our bags, and took a taxi to the airport. Unfortunately, we arrived about an hour before check-in opened, but we waited that out, passed through security, and caught our 11:45 flight to Melbourne. After a short layover we caught a nonstop to San Francisco, a 13-hour flight but a comfortable one. It was good to get home, early on June 13. Now for the challenge of staying awake, to get onto California time!

This article is the final installment in a series about our recent trip to Indonesia and Australia. To see the introductory article in the series, click here.

Tuesday, 23. July 2024

Altmode

Kimberley Trip, Day 20: Rottnest Island

Tuesday, June 11, 2024 We have heard a lot about Rottnest Island, which is just off the coast of Perth. Rottnest (“Rotto” to the locals) is a recreational island with cabins for rent, several beaches with snorkeling (in the summer), restaurants, a golf course, skydiving, and more. It is also the home of about 10,000 […]

Tuesday, June 11, 2024

We have heard a lot about Rottnest Island, which is just off the coast of Perth. Rottnest (“Rotto” to the locals) is a recreational island with cabins for rent, several beaches with snorkeling (in the summer), restaurants, a golf course, skydiving, and more. It is also the home of about 10,000 quokkas. Quokkas are small marsupials billed as “the happiest animals on earth.”

We booked a round trip on the Rottnest Express last night, so we got an early start this morning and grabbed coffee and a breakfast wrap on the way to the pier. We got seats upstairs for both the view and the airflow. The trip started slow (about 10 knots) as we left the pier, increased to about twice that once we reached a broad section of the Swan River, and slowed again as we approached Fremantle. We stopped, picked up more passengers there, and then proceeded to Rottnest Island at a higher speed.

Although we had been equipped with a map, we weren’t sure where to go when we got to Rottnest. We stumbled on the main village and were heading out when we noticed some rain starting to fall. We returned to the village to get some snacks, and the sky opened up: it poured! We were thankful to be in the village as there weren’t a lot of other places to get out of the rain. While there, we noticed our first quokka and gates on the grocery entrance and other businesses designed to keep quokkas out.

Despite a lot of public education not to feed the quokkas, many of them hang out in the village to eat up the crumbs and other droppage from human diners. They are indeed cute, but the “always smiling” thing seemed a little overstated. They can be a little aggressive, too: when Kenna and I shared a chicken sandwich for lunch, a quokka climbed on the bench between us and tried to get to the sandwich. We moved rather quickly.

Osprey Nest at Cape Vlamingh, Rottnest Island

The rain continued off and on, limiting our ability to venture beyond the village until we boarded a bus for a “discovery” tour we had booked for the afternoon. The tour took us around the perimeter of the island, with excellent descriptive commentary from a guide on the bus. We stopped at several points to take pictures, most notably at the far west end facing the vast Indian Ocean, where there was also a large osprey nest.

Following our tour, we stopped briefly to pick up souvenirs (a t-shirt for me), and it was time for our return trip to Perth. Along the way, in one of the slower-speed segments, a couple of hydrofoil surfers managed to “draft” off our ship, keeping up for quite a long time. We were also tailed briefly by a few dolphins.

Returning to our hotel, we were exhausted but managed to venture out for a quick pizza dinner.

This article is part of a series about our recent trip to Indonesia and Australia. To see the introductory article in the series, click here.

Monday, 22. July 2024

Uncle Sams Blog

Harris’ Struggle to Connect Now Imperative

President Joe Biden, grappling with dementia, has finally stepped aside, leaving Vice President Kamala Harris to run in his place for re-election. This is the breath of fresh air Dems…
President Joe Biden, grappling with dementia, has finally stepped aside, leaving Vice President Kamala Harris to run in his place for re-election. This is the breath of fresh air Dems…

Altmode

Kimberley Trip, Day 19: Perth and Fremantle

Monday, June 10, 2024 Kenna and I slept in a bit today, then set out into downtown Perth. We grabbed breakfast at one of the many coffee shops. Perth seems to be quite a coffee city, but there are no Starbucks here. We walked to Elizabeth Quay and the Barrack Street jetty, from which many […]

Monday, June 10, 2024

Kenna and I slept in a bit today, then set out into downtown Perth. We grabbed breakfast at one of the many coffee shops. Perth seems to be quite a coffee city, but there are no Starbucks here.

We walked to Elizabeth Quay and the Barrack Street jetty, from which many tour boats originate. We picked up some information on Rottnest Island tours and were surprised when Lee and Colleen, two of our fellow passengers on the Orion, also showed up. We may see them again.

From there, we walked to Kings Park and its botanical garden. We made some wrong turns but eventually found it and discovered it is up quite a hill from downtown and has a great view. The botanical garden is nicely labeled and includes specific sections for areas like the Kimberley. From a distance, we saw Lee and Colleen here too. We could have spent a lot more time here, but we wanted to make it out to Fremantle, so we caught a bus to the train station and a train to Fremantle.

One of our first goals in Fremantle (“Freo” to the locals) was to visit the Indian Ocean for the first time. We went out to a nearby beach and touched the water, which, of course, felt like…water. Soon after that, we ran into a couple of other fellow passengers from the Orion, who had just returned from the Maritime Museum. They told us it was excellent, so we headed over there.

After a quick late lunch in their snack bar, we toured the exhibits. One memorable exhibit for me was Australia II, the Australian winner of the America’s Cup sailing race in 1983. It was a big deal at the time and continues to be a source of pride for Australians.

Fremantle hadn’t initially seemed much like other beach towns (Santa Cruz, Brighton UK, etc.) I have visited. But as we headed more into the city center, we began to see the makings of a beach town: some brewpubs, a game arcade, souvenir shops, etc. It was still somewhat more restrained than many.

Exhausted from several miles of walking, we returned to the train station and rode back to Perth, where we had more walking to do to get to the hotel. We opted to have dinner in the hotel’s pub, which was mediocre and surprisingly closed at 8 p.m. — earlier than most any pub I have seen.

This article is part of a series about our recent trip to Indonesia and Australia. To see the introductory article in the series, click here.

Sunday, 21. July 2024

Aaron Parecki

My IETF 120 Agenda

The sessions I will be attending and presenting at during IETF 120 in Vancouver

Here's where you can find me at IETF 120 in Vancouver!

Monday 9:30 - 11:30 • alldispatch • Regency C/D 13:00 - 15:00 • oauth • Plaza B 18:30 - 19:30 • Hackdemo Happy Hour • Regency Hallway Tuesday 15:30 - 17:00 • oauth • Georgia A 17:30 - 18:30 • oauth • Plaza B Wednesday 9:30 - 11:30 • wimse • Georgia A 11:45 - 12:45 • Chairs Forum • Regency C/D 17:30 - 19:30 • IETF Plenary • Regency A/B/C/D Thursday 17:00 - 18:00 • spice • Regency A/B 18:30 - 19:30 • spice • Regency A/B Friday 13:00 - 15:00 • oauth • Regency A/B My Current Drafts OAuth 2.1 OAuth for Browser-Based Apps OAuth Protected Resource Metadata OAuth for First-Party Apps OAuth Client ID Metadata Document Identity Assertion Authorization Grant OAuth Global Token Revocation

Altmode

Kimberley Trip, Day 18: Darwin to Perth

Sunday, June 9, 2024 Today started quite early because they needed to get us off the Orion early and prepare it in eight hours for the next cruise. We had our suitcases out for pickup at 7 a.m., ate breakfast, and exited the ship about 7:30. We had to go through Australian customs, which went […]

Sunday, June 9, 2024

Today started quite early because they needed to get us off the Orion early and prepare it in eight hours for the next cruise. We had our suitcases out for pickup at 7 a.m., ate breakfast, and exited the ship about 7:30. We had to go through Australian customs, which went smoothly for us but resulted in delays for some others, primarily some MDs who had medical supplies with them and one passenger who had bought a wooden oar somewhere along the way.

There were two busloads of passengers going to the airport. Our bus driver polled the passengers about our departure times and then took us on an impromptu bus tour of Darwin. Darwin is a surprisingly modern and clean city, and the driver was rightfully proud of his home town.

Darwin skyline

We arrived at the airport about three hours before our flight. Unfortunately, we couldn’t check our luggage until 90 minutes before flight time, so we had to wait outside security. There were enough of us there that we were able to use the time to reminisce about the cruise.

When the time came, we obtained our luggage tags and sent our bags down the conveyor. Unlike in the US, we just entered our names and destinations to check our luggage; no ID was required. Similarly, we went through security without showing an ID or our boarding passes, and there was no ID check getting onto the plane. This was very surprising to us!

The flight to Perth was uneventful. When we arrived in Perth, we found the climate to be very different. While it is the dry season in Darwin and the Kimberley coast, it is the wet season in Perth, which has a climate quite similar to the San Francisco Bay Area but shifted by six months. It was rainy and cloudy when we arrived.

When we got off the plane, we were surprised to see Dave’s friend Bruce and his wife waiting for us at the gate! That’s a consequence of not requiring a boarding pass to go through security and was a nice throwback to pre-9/11 practices.

We checked into our hotel, The Melbourne Hotel (yes, located in Perth), and found that we had booked a nicely outfitted but unusual guest room. For example, there was no door to any of the bathroom area; it was open to the bedroom. This is fine, but it takes a little getting used to.

Dave’s friend Bruce had made reservations for a lovely restaurant in North Fremantle,
Bib and Tucker. I had kangaroo for dinner, which had a nice flavor and is a very lean meat (accordingly cooked somewhat rare). Our hosts pointed out that the kangaroo they serve is farmed explicitly for that purpose.

After a very busy day, we said goodbye to Jan and Dave who left for the US that evening and to our hosts. We took the train from North Fremantle back to Perth, where we crashed.

This article is part of a series about our recent trip to Indonesia and Australia. To see the introductory article in the series, click here.

Saturday, 20. July 2024

Altmode

Kimberley Trip, Day 17: Return to Darwin

Saturday, June 8, 2024 Today is an at-sea day as we cross back over the Joseph Bonaparte Gulf to Darwin, where we will disembark from the Orion tomorrow morning. Again, the seas were rough, but not as much as the outgoing crossing. As with other at-sea days, there were lectures from the naturalists and an […]

Saturday, June 8, 2024

Today is an at-sea day as we cross back over the Joseph Bonaparte Gulf to Darwin, where we will disembark from the Orion tomorrow morning. Again, the seas were rough, but not as much as the outgoing crossing. As with other at-sea days, there were lectures from the naturalists and an information session on future cruises. The day ended with the Captain’s cocktail reception and dinner. The cocktail reception featured a slideshow of photos that passengers had submitted (unfortunately, I misunderstood how to submit mine). There was also an auction of a specially-decorated flag from the ship to benefit the crew welfare fund and of a didgeridoo from Goyma’s family, the proceeds from which went back to his family. Happily, both auctions produced high bids for the beneficiaries.

For dinner, Dave and Jan were kind enough to join us in the COVID dining area for our final dinner on the Orion. It was a particularly good dinner, and we witnessed a beautiful sunset and moonset.

This article is part of a series about our recent trip to Indonesia and Australia. To see the introductory article in the series, click here.

Thursday, 27. June 2024

Just a Theory

Patch: Postgres ABI and API Guidance

Dear Postgres extension developers: Please review and give feedback on the proposed patch adding ABI and API guidance to the C language documentation.

Update 2011-11-14

I forgot to update this post at the time, but on July 31, Peter Eisentraut committed the patch patch ABI and API guidance to the C language documentation. I only noticed because today’s releases contained a modified ABI that broke a number of extensions. See the hackers thread for details.

TL;DR

If you’re a Postgres extension developer interested in understanding what to expect from core API and ABI stability, please review and give feedback on this patch (or pull request) adding ABI and API Guidance to the documentation.

In my PGConf.dev report a couple days ago, I mentioned that a few actionable items came out of the Improving extensions in core unconference session. One was the need to document the heretofore unofficial policy for API and ABI stability between major and, especially, minor versions of Postgres.

A frequent topic at the Extension Summit and Mini-Summits and a number of PCConf sessions has been concern regarding compatibility changes between minor releases of Postgres. At Mini Summit Five, for example, Yurri Rashkovskii presented a few examples of such changes, leading him to conclude, along with several others in the community, that C API-using extensions can only be used when built against the minor release with which they’re used.

In the Unconference session, core committers reported that such changes are carefully made, and rarely, if ever, affect extensions compiled for different minor releases of the same major version. Furthermore, they carefully make such changes to avoid compatibility issues. In the case Yurii found, for example, a field was added to a struct’s padding, without affecting the ordering of other fields, thus minimizing the risk of runtime failures.

It became clear that, although the committers follow a policy — and read new committers into it via patch review — it’s not documented anywhere. The result has been a bunch of sturm und drang amongst extension developer unsure what level of compatibility to depend on and what changes to expect.

The week after the conference, I started a pgsql-hackers thread proposing to document the committer policy. Following some discussion and review of potential ABI breaks in minor releases, the consensus seemed to be that the committers strive to avoid such breaks, that they’re quite uncommon in minor releases, and that most of the reported issues were due to using more obscure APIs.

As a result, we started drafting a policy, and after a few iterations, Peter Eisentraut pulled things together from the perspective of a core team member, reframed as “Server API and ABI Guidance”. I converted it into a patch (and pull request) to add it to the C Language docs. A key statement on minor releases:

In general, extension code that compiles and works with a minor release should also compile and work with any other minor release of the same major version, past or future.

I hope this document clarifies things. Even if it’s not as strict as some might hope, it at least documents the project approach to compatibility, so we have a better idea what to expect when using the C APIs. If you see gaps, or you have additional questions, please respond to pgsql-hackers thread — or the pull request (I’ll propagate comments to hackers).

More about… Postgres Extensions Patch ABI API

Thursday, 28. March 2024

Just a Theory

PGXN v2: Go or Rust?

What programming language(s) should we use to build new and revamp existing PGXN services and tools: Rust or Go? Vote your preference!

TL;DR: I’d like Postgres community input on a decision: Should we build PGXN v2 services and tools in Go or Rust? Context for the question and some weighing of options constitutes the rest of this post, but to skip to the end, 🗳️ Vote your choice! Poll closes April 12 at the end of the day (midnight) New York time.

The PGXN v2 project now under way requires developing or updating several services and tools, including:

A root registry for source distribution A package registry for binary distribution A command line client for developing, building, and distributing extension packages An interactions service for notifications and stats aggregation

And more. Historically, the PGXN tools were written in Perl, which was an ideal choice for me back in 2011, and quite familiar to many members of the core team, but also deeply foreign to most everyone else. Furthermore, its dynamic nature and need for dozens of dependencies in most apps makes installation and packaging a challenge, to say the least.1

In the last ten years I’ve become quite proficient in Go. I appreciate its near system-level performance, memory safety, robust standard library, concurrency design, and short learning curve — especially for web services. But perhaps most eye-opening to me, as a long-time user of dynamic languages, is that, like C, Go compiles an application into a single static binary. Not only that, but Go provides cross compilation natively. This makes distribution incredibly simple.

Distribution Digression

Compare, for example, the Dockerfile for Sqitch, the database change management system I wrote and maintain in Perl. It’s…a lot. Sure there are a ton of system dependencies, but what’s invisible in this file is the weeks of work that went into Module::Build::Sqitch, which performs a bunch of tricks to build the Sqitch “app” as a single directory with all of its Perl dependencies. Don’t get me wrong, the work was worth it for Sqitch, and powers the Homebrew formula, as well. But even there, I’ve not been able to get Sqitch into the Homebrew core because every single dependency requires a checksum, and I’ve not had the time (or energy) to figure out how to generate them.

Contrast with this Dockerfile for a Go service compiled into a binary named thinko:

FROM gcr.io/distroless/base-debian12:latest # TARGETOS and TARGETARCH: https://docs.docker.com/build/guide/multi-platform/ ARG TARGETOS ARG TARGETARCH COPY "_build/${TARGETOS}-${TARGETARCH}/thinko" /thinko/bin/ USER nonroot:nonroot ENTRYPOINT [ "/thinko/bin/thinko" ]

That’s the whole thing. There are no dependencies at all, aside from a few included in distroless image. And where does that image come from? This is the relevant from the project Makefile:

.PHONY: all # Build all binaries all: local linux darwin windows freebsd linux: thinko-linux darwin: thinko-darwin windows: thinko-windows freebsd: thinko-freebsd thinko-linux: _build/linux-amd64/thinko _build/linux-arm64/thinko thinko-darwin: _build/darwin-amd64/thinko _build/darwin-arm64/thinko thinko-windows: _build/windows-amd64/thinko _build/windows-arm64/thinko thinko-freebsd: _build/freebsd-amd64/thinko _build/freebsd-arm64/thinko # Build Thinko for specific platform _build/%/thinko: cmd/thinko GOOS=$(word 1,$(subst -, ,$*)) GOARCH=$(word 2,$(subst -, ,$*)) $(GO) build -o $@ ./$<

This configuration allows me to build thinko for every OS and architecture at once:

$ make thinko go build -o _build/local/thinko ./cmd/thinko GOOS=linux GOARCH=amd64 go build -o _build/linux-amd64/thinko ./cmd/thinko GOOS=linux GOARCH=arm64 go build -o _build/linux-arm64/thinko ./cmd/thinko GOOS=darwin GOARCH=amd64 go build -o _build/darwin-amd64/thinko ./cmd/thinko GOOS=darwin GOARCH=arm64 go build -o _build/darwin-arm64/thinko ./cmd/thinko GOOS=windows GOARCH=amd64 go build -o _build/windows-amd64/thinko ./cmd/thinko GOOS=windows GOARCH=arm64 go build -o _build/windows-arm64/thinko ./cmd/thinko GOOS=freebsd GOARCH=amd64 go build -o _build/freebsd-amd64/thinko ./cmd/thinko GOOS=freebsd GOARCH=arm64 go build -o _build/freebsd-arm64/thinko ./cmd/thinko

Those first two commands build thinko for Linux on amd64 and arm64, right where the Dockerfile expects them. Building then is easy; a separate make target runs the equivalent of:

$ docker buildx build --platform linux/arm64 -f dist/Dockerfile . $ docker buildx build --platform linux/amd64 -f dist/Dockerfile .

The --platform flag sets the TARGETOS and TARGETARCH arguments in the Dockerfile, and because the directories into which each binary were compiled have these same terms, the binary compiled for the right OS and architecture can be copied right in.

And that’s it, it’s ready to ship! No mucking with dependencies, tweaking system issues, removing unneeded stuff from the image. It’s just the bare minimum.

This pattern works not just for Docker images, of course. See, for example, how [Hugo], the Go blog generator, releases tarballs for a bunch of OSes and architectures, each containing nothing more than a README.md, LICENSE.md, and the hugo binary itself. This pattern allows both the Hugo Homebrew formula and its Dockerfile to be incredibly simple.

Back to PGXN

I very much want these advantages for the next generation of PGXN tools. Not only the services, but also the command-line client, which would become very easy to distribute to a wide variety of platforms with minimal effort.

But there are other variables to weigh in the choice of language for the PGXN servers and tools, including:

Familiarity to other developers: Ideally someone can quickly contribute to a project because they’re familiar with the language, or there’s a short learning curve.

Safety from common issues and vulnerabilities such as buffer overflows, and dangling pointers.

Tooling for robust and integrated development, including dependency management, testing, distribution, and of course cross-compilation.

Decisions, Decisions

In my experience, there are two language that fulfill these requirements very well:

🐿️ Go2 🦀 Rust

Which should we use? Some relevant notes:

I expect to do the bulk of the initial development on PGXN v2, as the only person currently dedicated full time to the project, and I’m most familiar with Go — indeed I enjoy writing web services and CLIs in Go!. I’d therefore be able go ship Go tools more quickly.

But I’ve played around with Rust a number of times over the years, and very much would like to learn more. Its syntax and long feature list steepen the learning curve, but given my background in Perl — another language with unique syntax and context-sensitive features — I’m certain I could become incredibly proficient in Rust after a few months.

My employer, Tembo, is a Rust shop, and we’ll likely borrow heavily from the trunk project, especially for the CLI and binary registry. It would also be easier for my coworkers to contribute.

pgrx, the tooling to build Postgres extensions in Rust, has taken the community by storm, rapidly building familiarity with the language among extensions developers. Perhaps some of those developers would also be willing to turn their expertise to PGXN Rust contributions, as well. It’s likely some features could be borrowed, as well.

Sadly, the plgo project appears to have stalled, so has not built up the same community momentum.

This leaves me torn! But it’s time to start coding, so it’s also time to make some decisions. Should PGXN v2 services and tool be:

🐿️ Written in Go 🦀 Written in Rust 🐿️ + 🦀 Some of each (e.g., Go for web services and Rust for CLIs)

What do you think? If you were to contribute to PGXN, what language would you like to work in? Do you think one language or the other would be more compatible with community direction or core development?3

Got an opinion? 🗳️ Vote! Poll closes April 12 at the end of the day (midnight) New York time.

And if those choices aren’t enough for you, please come yell at me on Mastodon, or via the #extensions channel on the Postgres Slack. Thanks!

Ever wonder why PGXN isn’t hosted by community servers? It’s because I screwed up the installation trying to balance all the dependencies without wiping out Perl modules the systems depend on. 🤦🏻‍♂️ ↩︎

Pity there’s no gopher emoji yet. ↩︎

I can imagine a future where an extension CLI was included in core. ↩︎

More about… Postgres PGXN Go Rust Perl