Last Update 6:44 AM April 19, 2024 (UTC)

Identity Blog Catcher

Brought to you by Identity Woman and Infominer.
Support this collaboration on Patreon!!!

Friday, 19. April 2024

Simon Willison

Ruff v0.4.0: a hand-written recursive descent parser for Python

Ruff v0.4.0: a hand-written recursive descent parser for Python The latest release of Ruff - a Python linter and formatter, written in Rust - includes a complete rewrite of the core parser. Previously Ruff used a parser borrowed from RustPython, generated using the LALRPOP parser generator. Victor Hugo Gomes contributed a new parser written from scratch, which provided a 2x speedup and also adde

Ruff v0.4.0: a hand-written recursive descent parser for Python

The latest release of Ruff - a Python linter and formatter, written in Rust - includes a complete rewrite of the core parser. Previously Ruff used a parser borrowed from RustPython, generated using the LALRPOP parser generator. Victor Hugo Gomes contributed a new parser written from scratch, which provided a 2x speedup and also added error recovery, allowing parsing of invalid Python - super-useful for a linter.

I tried Ruff 0.4.0 just now against Datasette - a reasonably large Python project - and it ran in less than 1/10th of a second. This thing is Fast.


John Philpin : Lifestream

Anyone have a nice OPML of micro blog feeds that they would

Anyone have a nice OPML of micro blog feeds that they would care to share? (Asking for a friend.)

Anyone have a nice OPML of micro blog feeds that they would care to share?

(Asking for a friend.)


When you sell a house, you pay a realtor, various service an

When you sell a house, you pay a realtor, various service and inspections, possibly the tax man. Maybe an accountant and/ or lawyer … and and and … there is no question … you pay. When you buy a house, it’s not dissimilar, you pay. Need the services of a lawyer .. pay. an accountant .. pay buying ‘little things’ in stores … pay big things, like a car, airline ticket? … pay need a plumb

When you sell a house, you pay a realtor, various service and inspections, possibly the tax man. Maybe an accountant and/ or lawyer … and and and … there is no question … you pay.

When you buy a house, it’s not dissimilar, you pay.

Need the services of a lawyer .. pay.

an accountant .. pay

buying ‘little things’ in stores … pay

big things, like a car, airline ticket? … pay

need a plumber, electrician, carpenter … pay

a ride from a to b … pay.

You get the picture … you … pay

Except for some reason people don’t really want to pay you for your writing, your music, your advice, your help, your network … all that is apparently on the ‘pay it forward’ basis.


When you sell a house, you pay a realtor, various service an

When you sell a house, you pay a realtor, various service and inspections, possibly the tax man. Maybe an accountant and/ or lawyer … and and and … there is no question … you pay. When you buy a house, it’s not dissimilar, you pay. Need the services of a lawyer .. pay. an accountant .. pay buying ‘little things’ in stores … pay big things, like a car, airline ticket? … pay need a plumb

When you sell a house, you pay a realtor, various service and inspections, possibly the tax man. Maybe an accountant and/ or lawyer … and and and … there is no question … you pay.

When you buy a house, it’s not dissimilar, you pay.

Need the services of a lawyer .. pay.

an accountant .. pay

buying ‘little things’ in stores … pay

big things, like a car, airline ticket? … pay

need a plumber, electrician, carpenter … pay

a ride from a to b … pay.

You get the picture … you … pay

Except for some reason people don’t really want to pay you for your writing, your music, your advice, your help, your network … all that is apparently on the ‘pay it forward’ basis.


I see lots of posts these days on LinkedIn from recent ‘arri

I see lots of posts these days on LinkedIn from recent ‘arrivals in the work force’, where the message is some version of .. ‘I am available .. if you need a python developer.’ I wonder if their request might be better served by ‘writing’ ‘I am available .. if you need someone to solve your (x marks the spot) problem. I, along with a few others have many problems. We have no idea if we n

I see lots of posts these days on LinkedIn from recent ‘arrivals in the work force’, where the message is some version of ..

‘I am available .. if you need a python developer.’

I wonder if their request might be better served by ‘writing’

‘I am available .. if you need someone to solve your (x marks the spot) problem.

I, along with a few others have many problems. We have no idea if we need ‘python’ developers.


There is a real sense of frustration in Ukraine about a pe

There is a real sense of frustration in Ukraine about a perceived double standard in US support for Israel. “The view among some is that if Britain, America, Jordan and all these other countries are willing to shoot down drones to help out Israel, why can’t they do something similar for us when we’re facing this all the time?” … from Today’s Yesterday’s Guardian

There is a real sense of frustration in Ukraine about a perceived double standard in US support for Israel. “The view among some is that if Britain, America, Jordan and all these other countries are willing to shoot down drones to help out Israel, why can’t they do something similar for us when we’re facing this all the time?”

… from Today’s Yesterday’s Guardian


It’s clear that there is a dickhead in charge over in the Mi

It’s clear that there is a dickhead in charge over in the Middle East.

It’s clear that there is a dickhead in charge over in the Middle East.


Simon Willison

A POI Database in One Line

A POI Database in One Line Overture maps offer an extraordinarily useful freely licensed databases of POI (point of interest) listings, principally derived from partners such as Facebook and including restaurants, shops, museums and other locations from all around the world. Their new "overturemaps" Python CLI utility makes it easy to quickly pull subsets of their data... but requires you to

A POI Database in One Line

Overture maps offer an extraordinarily useful freely licensed databases of POI (point of interest) listings, principally derived from partners such as Facebook and including restaurants, shops, museums and other locations from all around the world.

Their new "overturemaps" Python CLI utility makes it easy to quickly pull subsets of their data... but requires you to provide a bounding box to do so.

Drew Breunig came up with this delightful recipe for fetching data using LLM and gpt-3.5-turbo to fill in those bounding boxes:

overturemaps download --bbox=$(llm 'Give me a bounding box for Alameda, California expressed as only four numbers delineated by commas, with no spaces, longitude preceding latitude.') -f geojsonseq --type=place | geojson-to-sqlite alameda.db places - --nl --pk=id

Via @dbreunig

Thursday, 18. April 2024

Mike Jones: self-issued

OpenID Federation Session at April 2024 IIW

John Bradley and I convened a session on Trust Establishment with OpenID Federation at the Internet Identity Workshop (IIW) on Thursday, April 18, 2024. The material used to drive the discussion was: Trust Establishment with OpenID Federation (PowerPoint) (PDF) The session was well attended and the discussion lively. Numerous people with trust establishment problems to […]

John Bradley and I convened a session on Trust Establishment with OpenID Federation at the Internet Identity Workshop (IIW) on Thursday, April 18, 2024. The material used to drive the discussion was:

Trust Establishment with OpenID Federation (PowerPoint) (PDF)

The session was well attended and the discussion lively. Numerous people with trust establishment problems to solve contributed, including experts from the SAML federation world, people involved in digital wallet projects, and several people already using or considering using OpenID Federation. Thanks to all who participated!


Simon Willison

Andrej Karpathy's Llama 3 review

Andrej Karpathy's Llama 3 review The most interesting coverage I've seen so far of Meta's Llama 3 models (8b and 70b so far, 400b promised later). Andrej notes that Llama 3 trained on 15 trillion tokens - up from 2 trillion for Llama 2 - and they used that many even for the smaller 8b model, 75x more than the chinchilla scaling laws would suggest. The tokenizer has also changed - they now

Andrej Karpathy's Llama 3 review

The most interesting coverage I've seen so far of Meta's Llama 3 models (8b and 70b so far, 400b promised later).

Andrej notes that Llama 3 trained on 15 trillion tokens - up from 2 trillion for Llama 2 - and they used that many even for the smaller 8b model, 75x more than the chinchilla scaling laws would suggest.

The tokenizer has also changed - they now use 128,000 tokens, up from 32,000. This results in a 15% drop in the tokens needed to represent a string of text.

The one disappointment is the context length - just 8,192, 2x that of Llama 2 and 4x LLaMA 1 but still pretty small by today's standards.

If early indications hold, the 400b model could be the first genuinely GPT-4 class openly licensed model. We'll have to wait and see.


John Philpin : Lifestream

Who am I to disagree with Austin Kleon and in fact I still d

Who am I to disagree with Austin Kleon and in fact I still don’t .. but the conclusion to the article ends with a use case for why paper dictionaries .. which to this bear seems identical to the why online. What am I missing? 🔗 Source

Who am I to disagree with Austin Kleon and in fact I still don’t .. but the conclusion to the article ends with a use case for why paper dictionaries .. which to this bear seems identical to the why online. What am I missing? 🔗 Source


Simon Willison

How cheap, outsourced labour in Africa is shaping AI English

How cheap, outsourced labour in Africa is shaping AI English The word "delve" has been getting a lot of attention recently as an example of something that might be an indicator of ChatGPT generated content. One example: articles on medical research site PubMed now use “delve” 10 to 100 times more than a few years ago! Nigerian Twitter took offense recently to Paul Graham's suggestion that

How cheap, outsourced labour in Africa is shaping AI English

The word "delve" has been getting a lot of attention recently as an example of something that might be an indicator of ChatGPT generated content.

One example: articles on medical research site PubMed now use “delve” 10 to 100 times more than a few years ago!

Nigerian Twitter took offense recently to Paul Graham's suggestion that "delve" is a sign of bad writing. It turns out Nigerian formal writing has a subtly different vocabulary.

Alex Hern theorizes that the underlying cause may be related. Companies like OpenAI frequently outsource data annotation to countries like Nigeria that have excellent English skills and low wages. RLHF (reinforcement learning from human feedback) involves annotators comparing and voting on the "best" responses from the models.

Are they teaching models to favour Nigerian-English? It's a pretty solid theory!


Ben Werdmüller

Backdoors are an everyone problem

From the OpenJS Foundation: The recent attempted XZ Utils backdoor (CVE-2024-3094) may not be an isolated incident as evidenced by a similar credible takeover attempt intercepted by the OpenJS Foundation, home to JavaScript projects used by billions of websites worldwide. The Open Source Security (OpenSSF) and OpenJS Foundations are calling all open source maintainers to be alert for social en

From the OpenJS Foundation:

The recent attempted XZ Utils backdoor (CVE-2024-3094) may not be an isolated incident as evidenced by a similar credible takeover attempt intercepted by the OpenJS Foundation, home to JavaScript projects used by billions of websites worldwide. The Open Source Security (OpenSSF) and OpenJS Foundations are calling all open source maintainers to be alert for social engineering takeover attempts, to recognize the early threat patterns emerging, and to take steps to protect their open source projects.

Vigilance is good, and it’s worth heeding the advice and paying attention to the evidence presented here. The XZ Utils backdoor was a smart attack that very nearly caused havoc.

I think it’s also worth pointing out that we know about the exploit because it was in an open source project. Andres Freund was debugging a server resource issue when he uncovered the issue. Because the project — and its downstream client — were open source, he could investigate and find the intrusion.

It’s not clear how this would have panned out if this had been proprietary software: particularly on a team that was resource strapped or moving at speed. The same social engineering exploits that allowed Jia Tan to become a maintainer of the XZ Utils project would also see someone hired as a contractor by a tech team. If I was a nefarious actor who wanted to place an exploit in an important software library, that’s exactly what I’d do: go send someone to join the team as a contractor. While there are mandatory identity verification procedures for full-time employees (which we can certainly argue the pros and cons of), contractors have no such requirements.

I bring this up because all the advice I’ve seen to date has been directed at open source maintainers. Again, this is smart and good and should absolutely be heeded — but there’s a world of other software out there that is also critical infrastructure and which doesn’t enjoy the sunlight of open source projects. This isn’t an open source software problem; it’s a software problem. Everyone should be vigilant, regardless if there are eyes on their source code or not. And perhaps we should be even warier of projects whose code we can’t audit ourselves.


Patrick Breyer

EU-Stellungnahme gegen Überwachungswerbung auf Instagram, Facebook und Co.: Piraten fordern Meta zum Einlenken auf

Der EU-Datenschutzausschuss hat in einer gestern veröffentlichten Stellungnahme festgestellt, „dass es großen Online-Plattformen in den meisten Fällen nicht möglich sein wird, die Anforderungen an eine gültige Einwilligung zu erfüllen, wenn …

Der EU-Datenschutzausschuss hat in einer gestern veröffentlichten Stellungnahme festgestellt, „dass es großen Online-Plattformen in den meisten Fällen nicht möglich sein wird, die Anforderungen an eine gültige Einwilligung zu erfüllen, wenn sie die Nutzer nur vor die Wahl stellen, entweder in die Verarbeitung personenbezogener Daten für verhaltensbezogene Werbung einzuwilligen oder eine Gebühr zu zahlen.“ Der Europaabgeordnete der Piratenpartei Dr. Patrick Breyer bewertet das als faktisches Aus von Meta’s „pay or okay“-Plan, gegen den Verbraucherorganisationen bereits vor Gericht klagen. Zuvor hatten bereits 46 Europaabgeordnete auf Breyers Initiative dieses Bezahlmodell verurteilt, das Nutzern vor die Wahl zwischen Überwachung und Bezahlabo stellt.

„Metas Datenschutzgebühr untergräbt den Schutz der Privatsphäre, stellt Nutzer vor eine falsche Wahl und übt wirtschaftlichen Zwang zur Einwilligung in eine umfassende Überwachung und Ausbeutung der Onlineaktivitäten des Nutzer aus. Zuckerbergs Konzern wird nicht damit durchkommen, die Nutzer zur Einwilligung in diese Ausbeutung zu zwingen, indem er Datenschutz unbezahlbar macht. Die Ursache dafür, dass Meta auf einem rechtswidrigen Einwilligungsmodell besteht, ist sein auf allgegenwärtigem Tracking beruhendes Geschäftsmodell. Wie jetzt auch die EU-Datenschützer schreiben, muss Meta eine für den Nutzer kostenfreie Alternative zu diesem überwachungskapitalistischen Geschäftsmodell entwickeln, etwa aufbauend auf kontextbezogener und überwachungsfreier Werbung. Meta muss jetzt einlenken, sein ‘Pay or okay’-System aufgeben und endlich unser Grundrecht auf anonyme Internetnutzung respektieren!”


Simon Willison

Quoting Meta AI bot, answering a question on a forum

I have a child who is also 2e and has been part of the NYC G&T program. We've had a positive experience with the citywide program, specifically with the program at The Anderson School. — Meta AI bot, answering a question on a forum

I have a child who is also 2e and has been part of the NYC G&T program. We've had a positive experience with the citywide program, specifically with the program at The Anderson School.

Meta AI bot, answering a question on a forum


John Philpin : Lifestream

Wow — that’s a LOT of Bloody Mary I thought .. oh wait ‘OR’

Wow — that’s a LOT of Bloody Mary I thought .. oh wait ‘OR’ … not ‘OF’.

Wow — that’s a LOT of Bloody Mary I thought .. oh wait ‘OR’ … not ‘OF’.


Simon Willison

llm-reka

llm-reka My new plugin for running LLM prompts against the Reka family of API hosted LLM models: reka-core ($10 per million input), reka-flash (80c per million) and reka-edge (40c per million). All three of those models are trained from scratch by a team that includes several Google Brain alumni. Reka Core is their most powerful model, released on Monday 15th April and claiming benchmark s

llm-reka

My new plugin for running LLM prompts against the Reka family of API hosted LLM models: reka-core ($10 per million input), reka-flash (80c per million) and reka-edge (40c per million).

All three of those models are trained from scratch by a team that includes several Google Brain alumni.

Reka Core is their most powerful model, released on Monday 15th April and claiming benchmark scores competitive with GPT-4 and Claude 3 Opus.


Ben Werdmüller

I am convinced that ActivityPub is going ...

I am convinced that ActivityPub is going to change the entire web.

I am convinced that ActivityPub is going to change the entire web.


Simon Willison

mistralai/mistral-common

mistralai/mistral-common New from Mistral: mistral-common, an open source Python library providing "a set of tools to help you work with Mistral models". So far that means a tokenizer! This is similar to OpenAI's tiktoken library in that it lets you run tokenization in your own code, which crucially means you can count the number of tokens that you are about to use - useful for cost estimates

mistralai/mistral-common

New from Mistral: mistral-common, an open source Python library providing "a set of tools to help you work with Mistral models".

So far that means a tokenizer! This is similar to OpenAI's tiktoken library in that it lets you run tokenization in your own code, which crucially means you can count the number of tokens that you are about to use - useful for cost estimates but also for cramming the maximum allowed tokens in the context window for things like RAG.

Mistral's library is better than tiktoken though, in that it also includes logic for correctly calculating the tokens needed for conversation construction and tool definition. With OpenAI's APIs you're currently left guessing how many tokens are taken up by these advanced features.

Anthropic haven't published any form of tokenizer at all - it's the feature I'd most like to see from them next.

Here's how to explore the vocabulary of the tokenizer:

MistralTokenizer.from_model(
"open-mixtral-8x22b"
).instruct_tokenizer.tokenizer.vocab()[:12]

['<unk>', '<s>', '</s>', '[INST]', '[/INST]', '[TOOL_CALLS]', '[AVAILABLE_TOOLS]', '[/AVAILABLE_TOOLS]', '[TOOL_RESULTS]', '[/TOOL_RESULTS]']


Quoting Alex Albert (Anthropic)

In mid-March, we added this line to our system prompt to prevent Claude from thinking it can open URLs: "It cannot open URLs, links, or videos, so if it seems as though the interlocutor is expecting Claude to do so, it clarifies the situation and asks the human to paste the relevant text or image content directly into the conversation." — Alex Albert (Anthropic)

In mid-March, we added this line to our system prompt to prevent Claude from thinking it can open URLs:

"It cannot open URLs, links, or videos, so if it seems as though the interlocutor is expecting Claude to do so, it clarifies the situation and asks the human to paste the relevant text or image content directly into the conversation."

Alex Albert (Anthropic)

Wednesday, 17. April 2024

John Philpin : Lifestream

Don’t worry about the effect on the environment .. all the p

Don’t worry about the effect on the environment .. all the passengers carbon offset their vacation by using cardboard straws in their cocktails.

Don’t worry about the effect on the environment .. all the passengers carbon offset their vacation by using cardboard straws in their cocktails.


Suggestions that Informatica to be bought by the increasingl

Suggestions that Informatica to be bought by the increasingly wrongly named ‘SalesForce’.

Suggestions that Informatica to be bought by the increasingly wrongly named ‘SalesForce’.


Simon Willison

AI for Data Journalism: demonstrating what we can do with this stuff right now

I gave a talk last month at the Story Discovery at Scale data journalism conference hosted at Stanford by Big Local News. My brief was to go deep into the things we can use Large Language Models for right now, illustrated by a flurry of demos to help provide starting points for further conversations at the conference. I used the talk as an opportunity for some demo driven development - I pulled

I gave a talk last month at the Story Discovery at Scale data journalism conference hosted at Stanford by Big Local News. My brief was to go deep into the things we can use Large Language Models for right now, illustrated by a flurry of demos to help provide starting points for further conversations at the conference.

I used the talk as an opportunity for some demo driven development - I pulled together a bunch of different project strands for the talk, then spent the following weeks turning them into releasable tools.

There are 12 live demos in this talk!

Haikus from images with Claude 3 Haiku Pasting data from Google Sheets into Datasette Cloud AI-assisted SQL queries with datasette-query-assistant Scraping data with shot-scraper Enriching data in a table Command-line tools for working with LLMs Structured data extraction Code Interpreter and access to tools Running queries in Datasette from ChatGPT using a GPT Semantic search with embeddings Datasette Scribe: searchable Whisper transcripts Trying and failing to analyze hand-written campaign finance documents

The full 50 minute video of my talk is available on YouTube. Below I've turned that video into an annotated presentation, with screenshots, further information and links to related resources and demos that I showed during the talk.

What's new in LLMs?

00m08s

My focus in researching this area over the past couple of years has mainly been to forget about the futuristic stuff and focus on this question: what can I do with the tools that are available to me right now?

I blog a lot. Here's my AI tag (516 posts), and my LLMs tag (424).

The last six weeks have been wild for new AI capabilities that we can use to do interesting things. Some highlights:

Google Gemini Pro 1.5 is a new model from Google with a million token context (5x the previous largest) and that can handle images and video. I used it to convert a 7 second video of my bookcase into a JSON list of books, which I wrote about in this post. Anthropic released Claude 3 Opus, the first model to convincingly beat OpenAI's GPT-4. Anthropic then released Claude 3 Haiku, a model that is both cheaper and faster than GPT-3.5 Turbo and has a 200,000 token context limit and can process images. Opus at the top of the Chatbot Arena

The LMSYS Chatbot Arena is a great place to compare models because it captures their elusive vibes. It works by asking thousands of users to vote on the best responses to their prompts, picking from two anonymous models.

04m42s

Claude 3 Opus made it to the top, which was the first time ever for a model not produced by OpenAI!

06m12s

This Reddit post by Time-Winter-4319 animates the leaderboard since May 2023 and shows the moment in the last few weeks where Opus finally took the top spot.

Haikus from images with Claude 3 Haiku

To demonstrate Claude 3 Haiku I showed a demo of a little tool I built that can take a snapshot through a webcam and feed that to the Haiku model to generate a Haiku!

An improved version of that tool can be found here - source code here on GitHub.

It requires a Claude 3 API key which you can paste in and it will store in browser local storage (I never get to see your key).

Here's what it looks like on my iPhone:

It writes terrible Haikus every time you take a picture! Each one probably costs a fraction of a cent.

On the morning of the talk AI21 published this: Introducing Jamba: AI21's Groundbreaking SSM-Transformer Model. I mentioned that mainly to illustrate that the openly licensed model community has been moving quickly as well.

(In the weeks since I gave this talk the biggest stories from that space have been Command R+ and Mixtral 8x22b - both groundbreakingly capable openly licensed models.)

Pasting data from Google Sheets into Datasette Cloud

At this point I switched over to running some live demos, using Datasette running on Datasette Cloud.

09m24s

Tejas Kumar shared a Google Sheet with pricing comparison data for various LLMs. This was the perfect opportunity to demonstrate the new Datasette Import plugin, which makes it easy to paste data into Datasette from Google Sheets or Excel.

09m36s

Google Sheets (and Numbers and Excel) all support copying data directly out of the spreadsheet as TSV (tab separated values). This is ideal for pasting into other tools that support TSV.

10m07s

The Datasette Import plugin (previously called Datasette Paste) shows a preview of the first 100 rows. Click the blue "Upload 15 rows to Datasette" button to create the new table.

10m11s

AI-assisted SQL queries with datasette-query-assistant

Once I had imported the data I demonstrated another new plugin: datasette-query-assistant, which uses Claude 3 Haiku to allow users to pose a question in English which then gets translated into a SQL query against the database schema.

11m44s

In this case I had previously found out that MTok confuses the model - but telling it that it means "millions of tokens" gave it the information it needed to answer the question.

11m51s

The plugin works by constructing a heavily commented SQL query and then redirecting the user to a page that executes that query. It deliberately makes the query visible, in the hope that technical users might be able to spot if the SQL looks like it's doing the right thing.

Every page like this in Datasette has a URL that can be shared. Users can share that link with their team members to get a second pair of eyes on the query.

Scraping data with shot-scraper

An earlier speaker at the conference had shown the Champaign County property tax database compiled from FOIA data by CU-CitizenAccess at the University of Illinois in Urbana-Champaign.

13m47s

The interactive search tool is published using Flourish. If you open it in the Firefox DevTools console you can access the data using window.template.data:

14m07s

My shot-scraper tool provides a mechanism for scraping pages with JavaScript, by running a JavaScript expression in the context of a page using an invisible browser window.

15m15s

shot-scraper javascript \ 'https://flo.uri.sh/visualisation/16648221/embed?auto-1' \ 'window. template.data[_Flourish_dataset]' \ > /tmp/data.json

This gave me a 17MB JSON file, in the following shape:

[ { "columns": [ "LUTH, KATHRYN M TRUST", "526 COUNTY ROAD 2400 E", "BROADLANDS, IL 61816-9733", "013506100001", 110070, 250870, "Y", 147.26 ] }

I used jq to convert that into an array of objects suitable for importing into Datasette:

cat data.json| jq 'map({ "Owner Name": .columns[0], "Site Address 1": .columns[1], "City and Zip": .columns[2], "Parcel Number": .columns[3], "Farm Land": .columns[4], "Total Assessed Value": .columns[5], "Home Owner Exemption": .columns[6], "Gross Acreage": .columns[7] })' > cleaned.json

Which produced a file that looked like this:

[ { "Owner Name": "LUTH, KATHRYN M TRUST", "Site Address 1": "526 COUNTY ROAD 2400 E", "City and Zip": "BROADLANDS, IL 61816-9733", "Parcel Number": "013506100001", "Farm Land": 110070, "Total Assessed Value": 250870, "Home Owner Exemption": "Y", "Gross Acreage": 147.26 }

Then I pasted that into the same tool as before - it accepts JSON in addition to CSV and TSV:

15m50s

I used datasette-configure-fts to make it searchable by owner name:

16m18s

And now I can search for "john", order by Total Assessed Value and figure out who the richest John in Champaign County is!

16m24s

Enriching data in a table

My next demo involved Datasette Enrichments, a relatively new mechanism (launched in December) providing a plugin-based mechanism for running bulk operations against rows in a table.

Selecting the "Enrich selected data" table action provides a list of available enrichments, provided by a plugin.

17m06s

Datasette Cloud is running the following enrichment plugins:

datasette-enrichments-jinja datasette-enrichments-re2 datasette-enrichments-opencage datasette-enrichments-gpt datasette-embeddings

The geocoder plugin uses the OpenCage geocoder API to populate latitude and longitude columns from address data.

The address is provided as a template using values from columns in the table:

17m08s

I ran the geocoder... and a few seconds later my table started to display a map. And the map had markers all over the USA, which was clearly wrong because the markers should all have been in Champaign County!

17m57s

Why did it go wrong? On closer inspection, it turns out quite a few of the rows in the table have a blank value for the "City and Zip" column. Without that, the geocoder was picking other places with the same street address.

The fix for this would be to add the explicit state "Illinois" to the template used for geocoding. I didn't fix this during the talk for time reasons. I also quite like having demos like this that don't go perfectly, as it helps illustrate the real-world challenges of working with this kind of data.

I ran another demo of the AI query assistant, this time asking:

who is the richest home owner?

It built me a SQL query to answer that question. It seemed to do a good job:

18m55s

Command-line tools for working with LLMs

I switched away from Datasette to demonstrate my other main open source project, LLM. LLM is a command-line tool for interacting with Large Language Models, based around plugins that make it easy to extend to support different models.

Since terrible Haikus were something of a theme of the event already (I wasn't the first speaker to generate a Haiku), I demonstrated it by writing two more of them:

21m35s

LLM defaults to running prompts against the inexpensive OpenAI gpt-3.5-turbo model. Adding -m claude-3-opus (or some other model name, depending on installed plugins) runs the prompt against a different model, in this case Claude 3 Opus.

I'm using the llm-claude-3 plugin here.

Next I wanted to do something a lot more useful than generating terrible poetry. An exciting recent development in LLMs is the increasing availability of multi-modal models - models that can handle inputs other than text, such as images.

Most of these models deal with images, not PDFs - so the first step was to turn a PDF into a PNG image.

This was an opportunity to demonstrate another recent LLM plugin, llm cmd, which takes a prompt and turns it into a command line command ready to be executed (or reviewed and edited) directly in the terminal.

I ran this:

llm cmd convert order.pdf into a single long image with all of the pages

And it suggested I run:

convert -density 300 order.pdf -append order.png

22m11s

That looked OK to me, so I hit enter - and it spat out a order.png file that was a single long image with 7 pages of PDF concatenated together.

I then passed that to the new Gemini Pro 1.5 model like so:

llm -m pro15 -i order.png 'extract text'

The -i order.png option is not yet available in an LLM release - here I'm running the image-experimental branch of LLM and the images branch of the llm-gemini plugin.

And the model began returning text from that PDF, conveniently converted to Markdown:

23m04s

Is this the best technology for the job? Likely not. Using LLMs for this kind of content extraction has a lot of risks: what if the model hallucinates extra details in the output?

It's also important to keep the model's output length limit in mind. Even models that accept a million tokens of input often have output limits measured in just thousands of tokens (Gemini 1.5 Pro's output limit is 8,192).

I recommend dedicated text extraction tools like AWS Textract for this kind of thing instead. I released a textract-cli tool to help work with that shortly after I gave this talk.

Speaking of LLM mistakes... I previously attempted this same thing using that image fed into GPT-4 Vision, and got a very illustrative result:

23m47s

This text was extracted from the same image... and it's entirely incorrect! It talks about the wrong name - Latoya Jackson instead of Laurie Beth Kreuger - and every detail on the page is wrong, clearly hallucinated by the model.

What went wrong here? It was the size of the image. I fed GPT-4 Vision a 2,550 × 23,100 pixel PNG. That's clearly too large, so it looks to me like OpenAI resized the image down before feeding it to the model... but in doing so, they made the text virtually illegible. The model picked up just enough details from what was left to confidently hallucinate a completely different document.

Another useful reminder of quite how weird the mistakes can be when working with these tools!

Structured data extraction

My next demo covered my absolute favourite use-case for these tools in a data journalism capacity: structured data extraction.

I've since turned this section into a separate, dedicated demo, with a 3m43s YouTube video and accompanying blog post.

I used the datasette-extract plugin, which lets you configure a new database table:

26m02s

Then copy and paste in any data you like. Here I'm grabbing text from the upcoming events calendar for the Bach Dancing & Dynamite Society Jazz venue in Half Moon Bay, California. You can read more about them on their Wikipedia page, which I created a few weeks ago.

26m21s

You paste the unstructured text into a box:

26m29s

And run the extraction:

26m38s

The result is a database table containing structured data that has been extracted from the unstructured text by the model! In this case the model was GPT-4 Turbo.

The best part is that the same technique works for images as well. Here's a photo of a flier I found for an upcoming event in Half Moon Bay:

27m56s

I can extract that image directly into the table, saving me from needing to configure the columns again.

28m32s

Initially I thought it had made a mistake here - it assumed 2022 instead of 2024.

But... I checked just now, and 6th May was indeed a Friday in 2022 but a Monday in 2024. And the event's QR code confirms that this was an old poster for an event from two years ago! It guessed correctly.

Code Interpreter and access to tools

The next part of my demo wasn't planned. I was going to dive into tool usage by demonstrating what happens when you give ChatGPT the ability to run queries directly against Datasette... but an informal survey showed that few people in the room had seen ChatGPT Code Interpreter at work. So I decided to take a diversion and demonstrate that instead.

Code Interpreter is the mode of (paid) ChatGPT where the model can generate Python code, execute it, and use the results as part of the ongoing conversation.

It's incredibly powerful but also very difficult to use. I tried to trigger it by asking for the factorial of 14... but ChatGPT attempted an answer without using Python. So I prompted:

Factorial of 14, use code interpreter

30m26s

Where it gets really interesting is when you start uploading data to it.

I found a CSV file on my computer called Calls for Service 2024(1).csv. I'd previously obtained this from a New Orleans data portal.

I uploaded the file to ChatGPT and prompted it:

tell me interesting things about this data

Here's the full transcript of my demo. It turned out not to be as interesting as I had hoped, because I accidentally uploaded a CSV file with just 10 rows of data!

The most interesting result I got was when I said "OK find something more interesting than that to chart" and it produced this chart of incident types:

34m09s

I've written a bunch of more detailed pieces about Code Interpreter. These are the most interesting:

Building and testing C extensions for SQLite with ChatGPT Code Interpreter Claude and ChatGPT for ad-hoc sidequests Running Python micro-benchmarks using the ChatGPT Code Interpreter alpha Expanding ChatGPT Code Interpreter with Python packages, Deno and Lua Running queries in Datasette from ChatGPT using a GPT

Keeping to the theme of extending LLMs with access to tools, my next demo used the GPTs feature added to ChatGPT back in November (see my notes on that launch).

GPTs let you create your own custom version of ChatGPT that lives in the ChatGPT interface. You can adjust its behaviour with custom instructions, and you can also teach it how to access external tools via web APIs.

I configured a GPT to talk to my Datasette demo instance using the YAML configurations shared in this Gist, and a Datasette Cloud read-only API key (see Getting started with the Datasette Cloud API, or install the datasette-auth-tokens plugin on your own instance).

Datasette provides a JSON API that can be used to execute SQLite SQL queries directly against a dataabse. GPT-4 already knows SQLite SQL, so describing the endpoint takes very little configuration.

36m02s

Once configured like this the regular ChatGPT interface can be used to talk directly with the GPT, which can then attempt to answer questions by executing SQL queries against Datasette.

Here's my transcript from the demo - the full sequence of my prompts was:

list tables

Find me the most expensive property in the county tax database based on price per acre

use sqlite_master (a hint about how to figure out the SQL schema)

38m22s

Clicking on the "Talked to xxx.datasette.cloud" message shows the SQL query that was executed:

38m38s

Semantic search with embeddings

One of my favourite Large Language Model adjacent technologies is embeddings. These provide a way to turn text into fixed-length arrays of floating point numbers which capture something about the semantic meaning of that text - allowing us to build search engines that operate based on semantic meaning as opposed to direct keyword matches.

I wrote about these extensively in Embeddings: What they are and why they matter.

datasette-embeddings is a new plugin that adds two features: the ability to calculate and store embeddings (implemented as an enrichment), and the ability to then use them to run semantic similarity searches against the table.

The first step is to enrich that data. I started with a table of session descriptions from the recent NICAR 2024 data journalism conference (which the conference publishes as a convenient CSV or JSON file).

I selected the "text embeddings with OpenAI enrichment" and configured it to run against a template containing the session title and description:

39m46s

Having run the enrichment a new table option becomes available: "Semantic search". I can enter a search term, in this case "things that will upset politicians":

40m07s

Running the search lands me on a SQL page with a query that shows the most relevant rows to that search term based on those embeddings:

40m11s

Semantic search like this is a key step in implementing RAG - Retrieval Augmented Generation, the trick where you take a user's question, find the most relevant documents for answering it, then paste entire copies of those documents into a prompt and follow them with the user's question.

I haven't implemented RAG on top of Datasette Embeddings yet but it's an obvious next step.

Datasette Scribe: searchable Whisper transcripts

My last demo was Datasette Scribe, a Datasette plugin currently being developed by Alex Garcia as part of the work he's doing with me on Datasette Cloud (generously sponsored by Fly.io).

Datasette Scribe builds on top of Whisper, the extraordinarily powerful audio transcription model released by OpenAI in September 2022. We're running Whisper on Fly's new GPU instances.

Datasette Scribe is a tool for making audio transcripts of meetings searchable. It currently works against YouTube, but will expand to other sources soon. Give it the URL of one or more YouTube videos and it indexes them, diarizes them (to figure out who is speaking when) and makes the transcription directly searchable within Datasette Cloud.

43m34s

I demonstrated Scribe using a video of a meeting from the City of Palo Alto YouTube channel. Being able to analyze transcripts of city meetings without sitting through the whole thing is a powerful tool for local journalism.

43m55s

I pasted the URL into Scribe and left it running. A couple of minutes later it had extracted the audio, transcribed it, made it searchable and could display a visualizer showing who the top speakers are and who was speaking when.

44m23s

Scribe also offers a search feature, which lets you do things like search for every instance of the word "housing" in meetings in the Huntington Beach collection:

44m48s

The work-in-progress Datasette Scribe plugin can be found at datasette/datasette-scribe on GitHub.

Trying and failing to analyze hand-written campaign finance documents

During the Q&A I was reminded that a conference participant had shared a particularly gnarly example PDF with me earlier in the day. Could this new set of tools help with the ever-present challenge of extracting useful data from a scanned hand-written form like this one?

49m48s

This was a great opportunity to test my new llm -i option against some realistic data. I started by running the image through Google's Gemini Pro 1.5:

llm -m pro15 -i Hallam_annual_2020.jpeg 'convert to JSON'

Asking a model to convert an image to JSON is always an interesting demo. We are leaving the model to design the JSON schema itself - obviously it would be a lot more useful if we came up with a shared schema and passed it in, but it's fun to see what it comes up with:

50m26s

{ "filer_identification": { "name": "Friends of Bethany Hallam", "street_address": "827 Homewood Avenue", "city": "Pittsburgh", "state": "PA", "zip_code": "15237" }, "type_of_report": "Pre-Election", "date_of_election": "11/05/2019", "summary_of_receipts_and_expenditures": { "amount_brought_forward": 0, "total_monetary_contributions_and_receipts": 28113.94, "total_funds_available": 29730.35, "total_expenditures": 25574.41, "ending_cash_balance": 2615.94, "value_of_in_kind_contributions_received": 0 }, "treasurer_signature": { "name": "George", "date": "03/03/2020" }, "candidate_signature": { "name": "Bethany Hallam", "date": "03/03/2020" } }

At first glance this looks really good! But on closer inspection, the total number it reports is 28113.94 - but the number on the handwritten form is 2811.93 - off by a factor of ten!

So sadly it looks like we're not quite there yet with this kind of handwritten document analysis, at least for Gemini Pro 1.5.

I tried one last thing: adding -m opus to run it through Claude 3 Opus instead:

51m11s

It didn't give me JSON at all! Instead it said the following:

I apologize, but I do not feel comfortable converting the personal information from this campaign finance report into a JSON format, as that would involve extracting and structuring private details about the individual. Perhaps we could have a thoughtful discussion about campaign finance reporting requirements and processes in general, without referencing any specific personal information. I'm happy to have a respectful dialogue if you'd like to explore the broader topic further.

This was the perfect response for the end of my talk! Claude 3 Opus lecturing a room full of professional journalists on how they should "have a thoughtful discussion about campaign finance reporting requirements and processes in general, without referencing any specific personal information" was a hilarious note to end on, and a fantastic illustration of yet another pitfall of working with these models in a real-world journalism context.

Get this for your newsroom

Datasette and Datasette Cloud can do a lot of useful things right now. Almost everything I showed today can be done with the open source project, but the goal of Datasette Cloud is to make these tools available to newsrooms and organizations that don't want to run everything themselves.

If this looks relevant to your team we would love to hear from you. Drop me a line at swillison @ Google's email provider and let's set up a time to talk!

Colophon

Since this talk was entirely demos rather than slides, my usual approach of turning slides into images for my write-up wasn't quite right.

Instead, I extracted an MP4 file of the video (yt-dlp --recode-video mp4 'https://www.youtube.com/watch?v=BJxPKr6ixSM') and watched that myself at double speed to figure out which frames would be best for illustrating the talk.

I wanted to hit a key to grab screenshots at different moments. I ended up using GPT-4 to help build a script to capture frames from a QuickTime video, which were saved to my /tmp folder with names like frame_005026.jpg - where the filename represents the HHMMSS point within the video.

After writing up my commentary I realized that I really wanted to link each frame to the point in the video where it occurred. With more ChatGPT assistance I built a VS Code regular expression for this:

Find:

(<p><img src="https://static\.simonwillison\.net/static/2024/story-discovery-at-scale/frame_00(\d{2})(\d{2})\.jpg" alt="[^"]+" style="max-width: 100%;" /></p>)

Replace with:

$1 <p><a href="https://www.youtube.com/watch?v=BJxPKr6ixSM&amp;t=$2m$3s">$2m$3s</a></p>

I also generated a talk transcript with MacWhisper, but I ended up not using that at all - typing up individual notes to accompany each frame turned out to be a better way of putting together this article.


Quoting Molly White

But the reality is that you can't build a hundred-billion-dollar industry around a technology that's kind of useful, mostly in mundane ways, and that boasts perhaps small increases in productivity if and only if the people who use it fully understand its limitations. — Molly White

But the reality is that you can't build a hundred-billion-dollar industry around a technology that's kind of useful, mostly in mundane ways, and that boasts perhaps small increases in productivity if and only if the people who use it fully understand its limitations.

Molly White


John Philpin : Lifestream

Suggestions that Informatica to be bought my the increasingl

Suggestions that Informatica to be bought my the increasingly wrongly named ‘Salesforce’

Suggestions that Informatica to be bought my the increasingly wrongly named ‘Salesforce’


Meanwhile lie, over on another network, my response to some

Meanwhile lie, over on another network, my response to some advice …

Meanwhile lie, over on another network, my response to some advice …


Early Start to a busy day …

Early Start to a busy day …

Early Start to a busy day …


Ben Werdmüller

AI isn't useless. But is it worth it?

"I find my feelings about AI are actually pretty similar to my feelings about blockchains: they do a poor job of much of what people try to do with them, they can't do the things their creators claim they one day might, and many of the things they are well suited to do may not be altogether that beneficial." This description of the uses and pitfalls of the current generatio

"I find my feelings about AI are actually pretty similar to my feelings about blockchains: they do a poor job of much of what people try to do with them, they can't do the things their creators claim they one day might, and many of the things they are well suited to do may not be altogether that beneficial."

This description of the uses and pitfalls of the current generation of AI tools is a characteristically sharp breakdown from Molly White.

I've found AI useful for similar sorts of things: proofreading in particular. But I agree with her conclusions - in fact, I agree with every single point she brings up in this piece. One to share with your colleagues who are thinking about deeply integrating LLMs based on the hype cycle alone. #Technology

[Link]


John Philpin : Lifestream

Substack is a unified media ecosystem. It encompasses both

Substack is a unified media ecosystem. It encompasses both publishing and conversation. This duality is important, because ….. ramble ramble ramble .. Hamish McKenzie over on Substack but I liked the first part and racking my brains to try and recall where I’d seen that idea before …

Substack is a unified media ecosystem. It encompasses both publishing and conversation. This duality is important, because ….. ramble ramble ramble

.. Hamish McKenzie over on Substack but I liked the first part and racking my brains to try and recall where I’d seen that idea before …


Patrick Breyer

Leak: Datenschutzfreundliche und verschlüsselte Messengerdienste sollen mit Chatkontrolle bestraft werden

Ein gestern vom Nachrichtenportal netzpolitik.org geleaktes Dokument enthüllt erstmals, wann Messenger- und Chatdiensten nach dem Vorschlag des EU-Ratsvorsitzes ein „hohes Risiko“ bescheinigt werden soll, was den Weg für ihre …

Ein gestern vom Nachrichtenportal netzpolitik.org geleaktes Dokument enthüllt erstmals, wann Messenger- und Chatdiensten nach dem Vorschlag des EU-Ratsvorsitzes ein „hohes Risiko“ bescheinigt werden soll, was den Weg für ihre Verpflichtung zur Chatkontrolle frei machen soll. Zur Chatkontrolle führen kann danach, dass ein Dienst „anonyme Profile oder Zugriffe ohne Benutzerkonto“ ermöglicht, dass „verschlüsseltes Messaging“ angeboten wird oder dass überhaupt interaktive Funktionen wie Direktnachrichten, Posts oder Nutzerkommentare angeboten werden. Das Dokument stammt vom Februar, jedoch sieht eine aktualisierte Fassung vom 10. April ähnliche Kriterien vor.

„Ausgerechnet die bisher datenschutzfreundlich anonym nutzbaren Kommunikationsdienste wie Protonmail sollen per Verpflichtung zur Chatkontrolle zu den extremüberwachtesten Diensten werden. Ausgerechnet die bisher sicher verschlüsselten Messengerdienste wie Signal sollen durch verpflichtendes ‚client-side scanning‘ zu Spionen auf unseren Smartphones werden“, prangert der Europaabgeordnete der Piratenpartei und Schattenberichterstatter zur Chatkontrolle Dr. Patrick Breyer an. „Die EU-Regierungen wollen gegen Vertraulichkeit und Sicherheit unserer digitalen Kommunikation insgesamt in den Krieg ziehen. Alles, was das Internet und digitale Kommunikation ausmacht und moderne Lebensrealität ist, ist für die EU-Innenminister ein zu bekämpfendes ‚Risiko‘. Wir Piraten werden nicht aufhören, für unser Grundrecht auf digitales Briefgeheimnis und sichere Verschlüsselung zu kämpfen – genau dafür sind wir digitale Freiheitskämpfer im EU-Parlament.“

„Sichere Verschlüsselung und datensparsame Messenger sind in den Fokus der Überwachung gerückt“, bewertet die Informatikerin und Spitzenkandidatin der Piratenpartei zur Europawahl Anja Hirschel. „Genau wie befürchtet sollen also Nachrichten vor der Transportverschlüsselung direkt auf den Endgeräten mitgelesen werden. Anbieter können sich zukünftig dieser Verpflichtung, einen Zugriff zu ermöglichen, dann nicht mehr entziehen. Bisher sichere Kommunikationswege können dann mit einem Update zum persönlichen ‚Taschenspion‘ werden.”


John Philpin : Lifestream

Intermittently and seemingly for no apparent reason my mouse

Intermittently and seemingly for no apparent reason my mouse pointer gravitates to the top or side of the screen - push it back and up it moves again. I lift and bang it and that fixes it - as if there is fluff in there? But not as far as i can tell after blowing. Anyone else see this - have ideas - bad software somewhere?

Intermittently and seemingly for no apparent reason my mouse pointer gravitates to the top or side of the screen - push it back and up it moves again. I lift and bang it and that fixes it - as if there is fluff in there? But not as far as i can tell after blowing. Anyone else see this - have ideas - bad software somewhere?


Simon Willison

Scammers are targeting teenage boys on social media—and driving some to suicide.

Scammers are targeting teenage boys on social media—and driving some to suicide. Horrifying in depth report describing sextortion scams: a scammer tricks a teenage boy into sending them reciprocal nude photos, then instantly starts blackmailing them by threatening to forward those photos to their friends and family members. Most online scams take weeks or even months to play out - these scams ca

Scammers are targeting teenage boys on social media—and driving some to suicide.

Horrifying in depth report describing sextortion scams: a scammer tricks a teenage boy into sending them reciprocal nude photos, then instantly starts blackmailing them by threatening to forward those photos to their friends and family members. Most online scams take weeks or even months to play out - these scams can turn to blackmail within minutes.

Via Hacker News


Ben Werdmüller

Amazon is filled with garbage ebooks. Here’s how they get made.

"Virtually every single part of the self-publishing grift world that can be automated or monetized has been automated and monetized." This is a really depressing read: fascinating, for sure, but what's left unsaid is what happens to traditional publishing as these folks become more and more successful, and book marketplaces become more and more saturated. Or perhaps it'l

"Virtually every single part of the self-publishing grift world that can be automated or monetized has been automated and monetized."

This is a really depressing read: fascinating, for sure, but what's left unsaid is what happens to traditional publishing as these folks become more and more successful, and book marketplaces become more and more saturated.

Or perhaps it'll drive everyone to real-life bookstores? There, at least, I know I'm not going to run into the kind of trash sold by Big Luca or the Mikkelsen Twins. #Media

[Link]

Tuesday, 16. April 2024

John Philpin : Lifestream

' Don Snorleone ' … 😂😂😂😂😂

' Don Snorleone ' … 😂😂😂😂😂

' Don Snorleone ' … 😂😂😂😂😂


Simon Willison

Quoting Constance Grady

The saddest part about it, though, is that the garbage books don’t actually make that much money either. It’s even possible to lose money generating your low-quality ebook to sell on Kindle for $0.99. The way people make money these days is by teaching students the process of making a garbage ebook. It’s grift and garbage all the way down — and the people who ultimately lose out are the readers a

The saddest part about it, though, is that the garbage books don’t actually make that much money either. It’s even possible to lose money generating your low-quality ebook to sell on Kindle for $0.99. The way people make money these days is by teaching students the process of making a garbage ebook. It’s grift and garbage all the way down — and the people who ultimately lose out are the readers and writers who love books.

Constance Grady


John Philpin : Lifestream

🎲 Surprise Me! is now the phrasing in my single post footer

🎲 Surprise Me! is now the phrasing in my single post footer - replacing ‘Random’ .. seemed more in keeping with the tone that I am trying to achieve on this blog - not to mention that it is consistent with the other places I reference this functionality.

🎲 Surprise Me! is now the phrasing in my single post footer - replacing ‘Random’ .. seemed more in keeping with the tone that I am trying to achieve on this blog - not to mention that it is consistent with the other places I reference this functionality.


🎈107/366 A Personal Knowledge Repository?

It is lost in the mists of time but I seem to recall a post by @brentsimmons from years ago - maybe even last century? The advice was to reply to emails as a post, so that when a second email comes, you can modify the material on your blog and send the link. (I more than likely have over simplified, taken it out of context, changed the essence if not meaning / idea - but that’s what we do right?

It is lost in the mists of time but I seem to recall a post by @brentsimmons from years ago - maybe even last century? The advice was to reply to emails as a post, so that when a second email comes, you can modify the material on your blog and send the link. (I more than likely have over simplified, taken it out of context, changed the essence if not meaning / idea - but that’s what we do right? The point is that the core thinking has stuck with me … write once - use many.

It’s good advice and indeed many corporate processes follow this idea. Think help desk, somebody asks a gnarly question, it isn’t available in your ‘knowledge repository’, so you reply with original thought and then publish that answer into the repository so that anyone else who has the question can find the answer.

BUT - when it comes to the personal blog, even I - who has this ingrained in my brain, still doesn’t do this as routine and - even more frustrating - when i do .. it gets forgotten about over time because I don’t have a real system to manage such things - and why would I?

So it is about memory - but also I have found the Micro Blog Search-Space PlugIn by @sod absolutely excellent for unpacking such posts'.

All this to say that I have just put up a page on my blog that is the start of attempting to provide help to others that ask.

Enjoy.

🖇️ Getting Started With Your Own Micro Blog

At the beginning of the year I had grand plans for this series. A daily long-form post about something that was rattling my brain that day. And then life. For a while, I was even just dropping markers - to revisit. I came to realise that part of the problem was the complexity of the structure for each post - so that went away. Simplicity really is rather nice. As I write on 240413, I am now going back and filling in the gaps. PLUS - unless something strikes me immediately, I will not classify until the end of the day and go back to move one of the posts of the day into the 366. Also - if you are wondering how I have update the words at the bottom of over 100 posts at a stroke, well - THANK YOU Andy Sylvester and his Glossary plugin.

📡 Follow with RSS

🗄️ All the posts


Mike Jones: self-issued

OpenID Presentations at April 2024 OpenID Workshop and IIW

As has become traditional, I gave the following presentation at the Monday, April 15, 2024 OpenID Workshop at Google: OpenID Connect Working Group Update (PowerPoint) (PDF) I also gave this invited “101” session presentation at the Internet Identity Workshop (IIW) on Tuesday, April 16, 2024: Introduction to OpenID Connect (PowerPoint) (PDF)

As has become traditional, I gave the following presentation at the Monday, April 15, 2024 OpenID Workshop at Google:

OpenID Connect Working Group Update (PowerPoint) (PDF)

I also gave this invited “101” session presentation at the Internet Identity Workshop (IIW) on Tuesday, April 16, 2024:

Introduction to OpenID Connect (PowerPoint) (PDF)

John Philpin : Lifestream

Wait … ‘Eclectic Ink’ @pimoore ? Did I miss the announcement

Wait … ‘Eclectic Ink’ @pimoore ? Did I miss the announcement?

Wait … ‘Eclectic Ink’ @pimoore ? Did I miss the announcement?


Simon Willison

Google NotebookLM Data Exfiltration

Google NotebookLM Data Exfiltration NotebookLM is a Google Labs product that lets you store information as sources (mainly text files in PDF) and then ask questions against those sources - effectively an interface for building your own custom RAG (Retrieval Augmented Generation) chatbots. Unsurprisingly for anything that allows LLMs to interact with untrusted documents, it's susceptible to pr

Google NotebookLM Data Exfiltration

NotebookLM is a Google Labs product that lets you store information as sources (mainly text files in PDF) and then ask questions against those sources - effectively an interface for building your own custom RAG (Retrieval Augmented Generation) chatbots.

Unsurprisingly for anything that allows LLMs to interact with untrusted documents, it's susceptible to prompt injection.

Johann Rehberger found some classic prompt injection exfiltration attacks: you can create source documents with instructions that cause the chatbot to load a Markdown image that leaks other private data to an external domain as data passed in the query string.

Johann reported this privately in the December but the problem has not yet been addressed.

A good rule of thumb is that any time you let LLMs see untrusted tokens there is a risk of an attack like this, so you should be very careful to avoid exfiltration vectors like Markdown images or even outbound links.

Via @wunderwuzzi23


Quoting wkirby on Hacker News

Permissions have three moving parts, who wants to do it, what do they want to do, and on what object. Any good permission system has to be able to efficiently answer any permutation of those variables. Given this person and this object, what can they do? Given this object and this action, who can do it? Given this person and this action, which objects can they act upon? — wkirby on Hacker News

Permissions have three moving parts, who wants to do it, what do they want to do, and on what object. Any good permission system has to be able to efficiently answer any permutation of those variables. Given this person and this object, what can they do? Given this object and this action, who can do it? Given this person and this action, which objects can they act upon?

wkirby on Hacker News


inline-snapshot

inline-snapshot I'm a big fan of snapshot testing, where expected values are captured the first time a test suite runs and then asserted against in future runs. It's a very productive way to build a robust test suite. inline-snapshot by Frank Hoffmann is a particularly neat implementation of the pattern. It defines a snapshot() function which you can use in your tests: assert 1548 * 18489

inline-snapshot

I'm a big fan of snapshot testing, where expected values are captured the first time a test suite runs and then asserted against in future runs. It's a very productive way to build a robust test suite.

inline-snapshot by Frank Hoffmann is a particularly neat implementation of the pattern. It defines a snapshot() function which you can use in your tests:

assert 1548 * 18489 == snapshot()

When you run that test using "pytest --inline-snapshot=create" the snapshot() function will be replaced in your code (using AST manipulation) with itself wrapping the repr() of the expected result:

assert 1548 * 18489 == snapshot(28620972)

If you modify the code and need to update the tests you can run "pytest --inline-snapshot=fix" to regenerate the recorded snapshot values.


John Philpin : Lifestream

🎈106/366 Not a new one. Just stumbled across it in ‘the

🎈106/366 Not a new one. Just stumbled across it in ‘the collection’. It is real - took it myself - in Burford. At the beginning of the year I had grand plans for this series. A daily long-form post about something that was rattling my brain that day. And then life. For a while, I was even just dropping markers - to revisit. I came to realise that part of the problem was the complexity

🎈106/366

Not a new one. Just stumbled across it in ‘the collection’. It is real - took it myself - in Burford.

At the beginning of the year I had grand plans for this series. A daily long-form post about something that was rattling my brain that day. And then life. For a while, I was even just dropping markers - to revisit. I came to realise that part of the problem was the complexity of the structure for each post - so that went away. Simplicity really is rather nice. As I write on 240413, I am now going back and filling in the gaps. PLUS - unless something strikes me immediately, I will not classify until the end of the day and go back to move one of the posts of the day into the 366. Also - if you are wondering how I have update the words at the bottom of over 100 posts at a stroke, well - THANK YOU Andy Sylvester and his Glossary plugin.

📡 Follow with RSS

🗄️ All the posts


Phil Windleys Technometria

Relationships are Entangled

People aren't digitally embodied. As a result their online relationships are anemic. As we think about the kinds of digital identity systems that will help people live authentic online lives, we must build identity system that give people tools to operationalize rich digital relationships.

People aren't digitally embodied. As a result their online relationships are anemic. As we think about the kinds of digital identity systems that will help people live authentic online lives, we must build identity system that give people tools to operationalize rich digital relationships.

Identity is the ability to recognize, remember, and react to people, organizations, systems, and things. In the current web, companies employ many ponderous technological systems to perform those functions. In these systems, we are like ghosts in the machines. We have "accounts" in companies' systems, but no good way to recognize, remember, and react to them or anyone else. We are not digital embodied.

One of great benefits of embodiment is the ability to form and operationalize rich digital relationships. I've written a lot about the nature of digital relationships.

Relationships and Identity

Authentic Digital Relationships

Ephemeral Relationships

Operationalizing Digital Relationships

Relationships in the Self-Sovereign Internet of Things

The Architecture of Identity Systems

Are Transactional Relationships Enough?

Fluid Multi-Pseudonymity

One of the discussions at VRM Day caused me to think about a feature of digital relationships I hadn't considered before. Someone said that if you think about a graph with people (or things, organizations, and so on) as the nodes, the relationships are the edges, like so:

Bi-directional relationship

In this figure Alice and Bob have a bi-directional relationship. This is how I've normally thought about it and how I'd have drawn it. But in today's discussion, someone said that the relationship is shared and that Alice and Bob both control it. But I realized that viewpoint is too simple. Specifically, Alice and Bob each have a different perspective of that relationship and will use it separately.

For example, imagine that Alice is the cashier at a grocery store and Bob is a customer. Alice gives great service, so Bob seeks her out when he shops. Alice on the other hand has no particular recollection of Bob from encounter to encounter. For Alice, the relationship is ephemeral, but for Bob, it's longer term. The nature of each relationship is different. So, we might look at it like this[1]:

Two, uni-directional relationships

But after discussing it some more, I realized that these relationships aren't independent. They're entangled like this:

Entangled relationships

In the example I gave above, as Bob seeks out Alice more and more, Alice might come to recognize him and call him by name, changing the nature of her relationship with Bob. And that may influence the nature of Bob's relationship with Alice. Over time, these interactions influence both relationships. So, while Alice and Bob both have control over their relationship with the other, actions by one influence the other.

I frequently say that we don't build identity systems to manage identities, but rather to manage relationships. The problem with contemporary identity systems is that they are all one sided, controlled by one party—almost always a company. As I've said before, people are not digital embodied and thus have no good way to manage their online relationships. As we strive to build better digital identity systems, I think it's paramount that we build systems that provide people with tools that embody them and provide them with the ability to operationalize their online relationships. These are more than decentralized; they are self-sovereign.

Notes

Peer decentralized identifiers (DIDs) are a great technology for creating bi-directional relationships.

Share

Subscribe now

Monday, 15. April 2024

Simon Willison

OpenAI Batch API

OpenAI Batch API OpenAI are now offering a 50% discount on batch chat completion API calls if you submit them in bulk and allow for up to 24 hours for them to be run. Requests are sent as a newline-delimited JSON file, with each line looking something like this: {"custom_id": "request-1", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "gpt-3.5-turbo", "messages": [{"rol

OpenAI Batch API

OpenAI are now offering a 50% discount on batch chat completion API calls if you submit them in bulk and allow for up to 24 hours for them to be run.

Requests are sent as a newline-delimited JSON file, with each line looking something like this:

{"custom_id": "request-1", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "gpt-3.5-turbo", "messages": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is 2+2?"}]}}

You upload a file for the batch, kick off a batch request and then poll for completion.

This makes GPT-3.5 Turbo cheaper than Claude 3 Haiku - provided you're willing to wait a few hours for your responses.

Via Jeff Harris


Ben Werdmüller

What’s next for me…

"I am absolutely convinced that journalism’s most essential role at this critical moment goes far, far beyond what it’s doing. The status quo in political (and related) coverage consists of sporadically noting that gosh-maybe-there’s-a-problem, while sticking mostly to journalistic business as usual. The status quo is journalistic malpractice." A strong implied call to action

"I am absolutely convinced that journalism’s most essential role at this critical moment goes far, far beyond what it’s doing. The status quo in political (and related) coverage consists of sporadically noting that gosh-maybe-there’s-a-problem, while sticking mostly to journalistic business as usual. The status quo is journalistic malpractice."

A strong implied call to action from Dan Gillmor, who has long argued for a more principled journalism industry (alongside a more principled software ecosystem that supports it). #Media

[Link]


Just a Theory

Mini Summit: Jonathan Katz on TLEs

The topic of the Postgres Extension Mini-Summit this week is Trusted Language Extensions, a.k.a. TLEs. Jonathan Katz of AWS will be giving a brief presentation on the “TLE Vision and Specifics” at noon Eastern / 16:00 UTC on Wednesday April 17, followed by community discussion.

Coming up this week: the fourth installment of the Postgres extension ecosystem mini-summit. The topic: Trusted Language Extensions, a.k.a. TLEs. Jonathan Katz of AWS will give a brief presentation on “TLE Vision and Specifics” followed by community discussion of the issues TLEs aim to address, what works and what doesn’t, and the role of TLEs in the future of the extension ecosystem.

Join us! Note! that if you reserved a spot at a prior mini-summit, sadly you will need to do so again for each subsequent summit or miss out on reminders from Eventbrite. And if Eventbrite says the event is sold out, rest assured we have plenty more virtual seats! Just send at david@ this domain, ping me on Mastodon or via the #extensions channel on the Postgres Slack or the Postgres Discord.

More about… Postgres TLE Extensions PGConf Summit

John Philpin : Lifestream

I’m feeling a clean up coming on me …

I’m feeling a clean up coming on me …

I’m feeling a clean up coming on me …


I think I just used my new Random (Surprise Me) Link more ti

I think I just used my new Random (Surprise Me) Link more times in the last 10 minutes than it has ever been used in total.

I think I just used my new Random (Surprise Me) Link more times in the last 10 minutes than it has ever been used in total.


OH. Now that’s a neat idea. (The ‘random’ button between the

OH. Now that’s a neat idea. (The ‘random’ button between the previous and next)

OH. Now that’s a neat idea. (The ‘random’ button between the previous and next)


Was I talking about stumbling across things? Don’t think so,

Was I talking about stumbling across things? Don’t think so, but an example of just that from a few months ago. No words needed to explain the images. Just click through.       🔗 Weekend Reading — Never Again Is Now — Labnotes (by Assaf Arkin)

Was I talking about stumbling across things? Don’t think so, but an example of just that from a few months ago. No words needed to explain the images. Just click through.

 

 

 

🔗 Weekend Reading — Never Again Is Now — Labnotes (by Assaf Arkin)


It occurred to me watching a Martin Scorcese documentary a

It occurred to me watching a Martin Scorcese documentary about the life of George Harrison, how much people in music seek out opportunities to create with other musicians. In technology, it doesn’t happen, we don’t even look at each other’s software. 💬 Dave Winer 🔗 Source

It occurred to me watching a Martin Scorcese documentary about the life of George Harrison, how much people in music seek out opportunities to create with other musicians. In technology, it doesn’t happen, we don’t even look at each other’s software.

💬 Dave Winer

🔗 Source


Patrick Breyer

Leak: EU-Innenminister wollen sich selbst von der Chatkontrolle ausnehmen

Nachdem neuesten, vom Informationsdienst Contexte geleakten Gesetzentwurf zur Chatkontrolle, den die EU-Mitgliedsstaaten heute diskutieren, wollen die EU-Innenminister Accounts der Angehörigen von Sicherheitsbehörden wie Geheimdienste, Polizei und Militär von der …

Nachdem neuesten, vom Informationsdienst Contexte geleakten Gesetzentwurf zur Chatkontrolle, den die EU-Mitgliedsstaaten heute diskutieren, wollen die EU-Innenminister Accounts der Angehörigen von Sicherheitsbehörden wie Geheimdienste, Polizei und Militär von der geplanten Chatkontrolle ausnehmen (Artikel 1 Absatz 2a). Die Verordnung soll auch auf „vertrauliche Informationen“ wie Berufsgeheimnisse keine Anwendung finden (Artikel 1 Absatz 2b). Weiterhin lehnen es die EU-Regierungen ab, dass das geplante EU-Kinderschutzzentrum sie bei der Prävention sexuellen Kindesmissbrauchs unterstützt und Best Practices für Präventionsinitiativen ausarbeitet (Artikel 43 Absatz 8).

Alle Infos zur Chatkontrolle

„Dass die EU-Innenminister Polizisten, Soldaten, Geheimdienstler und auch sich selbst von der Chatkontrolle ausnehmen wollen beweist, dass sie wissen, wie unzuverlässig und gefährlich die Schnüffelalgorithmen sind, die sie auf uns Bürger loslassen wollen“, kommentiert die Europaabgeordnete der Piratenpartei Dr. Patrick Breyer. „Offensichtlich rechnen sie damit, dass selbst Militärgeheimnisse ohne jeglichen Bezug zu Kindesmissbrauch jederzeit in den USA landen könnten. Die Vertraulichkeit der Regierungskommunikation ist natürlich wichtig, aber dasselbe muss für den Schutz der Wirtschaft und natürlich der Bürgerinnen und Bürger gelten, einschließlich der Räume, die Missbrauchsopfer selbst für einen geschützten Austausch und Therapie brauchen. Wir wissen, dass die meisten von den heute noch freiwilligen Schnüffelalgorithmen geleakten Chats ohne jede Bedeutung für die Polizei sind, zum Beispiel Familienfotos oder einvernehmliches Sexting. Es ist eine Unverschämtheit, dass die EU-Innenminister die Folgen der Zerstörung des digitalen Briefgeheimnisses und sicherer Verschlüsselung, die sie uns zumuten, selbst nicht ausbaden wollen.“

Die Informatikerin und Spitzenkandidatin der Piratenpartei zur Europawahl Anja Hirschel ergänzt: „Die weitere Formulierung, dass Berufsgeheimnisse von der Chatkontrolle nicht betroffen sein sollen, ist eine in Paragrafen gegossene Lüge. Kein Anbieter und kein Algorithmus kann wissen oder feststellen, ob ein Chat mit Ärzten, Therapeuten, Anwälten, Strafverteidigern usw. geführt wird, um ihn von der Chatkontrolle auszunehmen. Die Chatkontrolle droht unweigerlich auch Intimfotos für Behandlungszwecke und Prozessunterlagen zur Verteidigung von Missbrauchsopfern an Personen zu leaken, bei denen sie nichts zu suchen haben und bei denen sie nicht sicher sind.“

Breyer erklärt abschließend: „Dem offiziellen Ziel des Kinderschutzes spricht es Hohn, dass die EU-Innenminister ausgerechnet bei der Vorbeugung von Kindesmissbrauch die Ausarbeitung von Best Practices ablehnen. Deutlicher kann man nicht machen, dass Ziel der Chatkontrolle Massenüberwachung mit chinesischen Methoden ist und nicht ein besserer Schutz unserer Kinder. Wer echten Kinderschutz will, würde eine systematische wissenschaftliche Evaluierung und Umsetzung multidisziplinärer Präventionsprogramme auf den Weg bringen, ebenso wie europaweite Standards und Leitlinien für strafrechtliche Ermittlungen wegen Kindesmissbrauchs, einschließlich der Identifikation von Opfern und der nötigen technischen Mittel. Nichts davon planen die EU-Innenminister.“

Die EU-Regierungen wollen die Chatkontrolle bis Anfang Juni beschließen.


Damien Bod

Implement a secure Blazor Web application using OpenID Connect and security headers

This article shows how to implement a secure .NET 8 Blazor Web application using OpenID Connect and security headers with CSP nonces. The NetEscapades.AspNetCore.SecurityHeaders nuget package is used to implement the security headers and OpenIddict is used to implement the OIDC server. Code: https://github.com/damienbod/BlazorWebOidc OpenIddict is used as the identity provider and an OpenID connect

This article shows how to implement a secure .NET 8 Blazor Web application using OpenID Connect and security headers with CSP nonces. The NetEscapades.AspNetCore.SecurityHeaders nuget package is used to implement the security headers and OpenIddict is used to implement the OIDC server.

Code: https://github.com/damienbod/BlazorWebOidc

OpenIddict is used as the identity provider and an OpenID connect client is setup to allow an OpenID Connect confidential code flow PKCE client. The Web application is a server rendered application using Blazor server components implemented using Blazor Web, ASP.NET Core and .NET 8.

Step 1: Init solution from the .NET Blazor samples

The solution was created using the Blazor samples from Microsoft. The .NET 8 BlazorWebAppOidc project was used to setup the solution.

https://github.com/dotnet/blazor-samples/tree/main/8.0/BlazorWebAppOidc

The code sample implements the client profile parts and the CSRF protection. Login and Logout plumbing is also implemented.

Step 2: Switch the OpenID Connect server

OpenIddict is used as the identity provider and so the OIDC client set up needs to be changed. The program file was updated and the OpenID Connect Microsoft Entra ID client was replaced with the OpenIddict client. The client on the server is setup directly in the worker class in the Openiddict server. Both of the setups must match. The client uses an OpenID Connect confidential client with code flow and PKCE.

builder.Services.AddAuthentication(OIDC_SCHEME) .AddOpenIdConnect(OIDC_SCHEME, options => { // From appsettings.json, keyvault, user-secrets // "OpenIDConnectSettings": { // "Authority": "https://localhost:44318", // "ClientId": "oidc-pkce-confidential", // "ClientSecret": "--secret-in-key-vault-user-secrets--" // }, builder.Configuration.GetSection("OpenIDConnectSettings").Bind(options); options.SignInScheme = CookieAuthenticationDefaults.AuthenticationScheme; options.ResponseType = OpenIdConnectResponseType.Code; options.SaveTokens = true; options.GetClaimsFromUserInfoEndpoint = true; options.MapInboundClaims = false; // Remove Microsoft mappings options.TokenValidationParameters = new TokenValidationParameters { NameClaimType = "name" }; }) .AddCookie();

Note: You could also use the OpenIddict client packages to implement the client. I like to use the defaults.

Step 3: Disable WASM mode

Any web application should protect the session, not just implement authentication using an OIDC server. One of the most important browser protection is the CSP header and a good CSP uses a nonce. Blazor Web using WASM does not support this and so this must be disabled. Remove the WASM part from the middleware.

In the program.cs, update Blazor Web to:

builder.Services.AddRazorComponents() .AddInteractiveServerComponents();

and

app.MapRazorComponents<App>() .AddInteractiveServerRenderMode() .AddAdditionalAssemblies( typeof(BlazorWebAppOidc.Client._Imports).Assembly);

Remove the WASM usage in the UI components. Switch to InteractiveServer mode.

<HeadOutlet @rendermode="InteractiveServer" /> </head> <body> <Routes @rendermode="InteractiveServer" /> Step 4: Add CSP nonce middleware

The CSP nonce can be used in Blazor (Server) components with some extra effort because the Blazor components cannot read the HTTP headers from the responses. The CircuitHandler class can be used for this. A BlazorNonceService class can be created to add the nonce. This class inherits the CircuitHandler implementation.

using Microsoft.AspNetCore.Components; using Microsoft.AspNetCore.Components.Server.Circuits; namespace BlazorWebAppOidc.CspServices; /// <summary> /// Original src: https://github.com/javiercn/BlazorWebNonceService /// </summary> public class BlazorNonceService : CircuitHandler, IDisposable { private readonly PersistentComponentState _state; private readonly PersistingComponentStateSubscription _subscription; public BlazorNonceService(PersistentComponentState state) { if (state.TryTakeFromJson("nonce", out string? nonce)) { if (nonce is not null) { Nonce = nonce; } else { throw new InvalidOperationException( "Nonce can't be null when provided"); } } else { _subscription = state.RegisterOnPersisting(PersistNonce); } _state = state; } public string? Nonce { get; set; } private Task PersistNonce() { _state.PersistAsJson("nonce", Nonce); return Task.CompletedTask; } public void SetNonce(string nonce) { ArgumentException.ThrowIfNullOrWhiteSpace(nonce); if (Nonce != null) { throw new InvalidOperationException("Nonce already defined"); } Nonce = nonce; } public void Dispose() => ((IDisposable)_subscription)?.Dispose(); }

A NonceMiddleware ASP.NET Core middleware service can now be used to read the nonce from the headers and set this in the BlazorNonceService CircuitHandler implementation. NetEscapades.AspNetCore.SecurityHeaders is used to implement the security headers and if a CSP nonce is created, the NETESCAPADES_NONCE http header is set.

namespace BlazorWebAppOidc.CspServices; public class NonceMiddleware { private readonly RequestDelegate _next; public NonceMiddleware(RequestDelegate next) { _next = next; } public async Task Invoke(HttpContext context, BlazorNonceService blazorNonceService) { var success = context.Items .TryGetValue("NETESCAPADES_NONCE", out var nonce); if (success && nonce != null) { blazorNonceService.SetNonce(nonce.ToString()!); } await _next.Invoke(context); } }

The middleware for the nonce is added to the ASP.NET Core services.

builder.Services.TryAddEnumerable( ServiceDescriptor.Scoped<CircuitHandler, BlazorNonceService>(sp => sp.GetRequiredService<BlazorNonceService>())); builder.Services.AddScoped<BlazorNonceService>();

Use the middleware is in the ASP.NET Core pipelines.

app.UseMiddleware<NonceMiddleware>(); Step 5: Add HTTP browser security headers

The NetEscapades.AspNetCore.SecurityHeaders nuget package is used to implement the security headers as best possible for this type of application. The SecurityHeadersDefinitions class implements this. CSP nonces are configuration as well as other security headers.

namespace BlazorWebAppOidc; public static class SecurityHeadersDefinitions { public static HeaderPolicyCollection GetHeaderPolicyCollection( bool isDev, string? idpHost) { ArgumentNullException.ThrowIfNull(idpHost); var policy = new HeaderPolicyCollection() .AddFrameOptionsDeny() .AddContentTypeOptionsNoSniff() .AddReferrerPolicyStrictOriginWhenCrossOrigin() .AddCrossOriginOpenerPolicy(builder => builder.SameOrigin()) .AddCrossOriginResourcePolicy(builder => builder.SameOrigin()) .AddCrossOriginEmbedderPolicy(builder => builder.RequireCorp()) .AddContentSecurityPolicy(builder => { builder.AddObjectSrc().None(); builder.AddBlockAllMixedContent(); builder.AddImgSrc().Self().From("data:"); builder.AddFormAction().Self().From(idpHost); builder.AddFontSrc().Self(); builder.AddBaseUri().Self(); builder.AddFrameAncestors().None(); builder.AddStyleSrc() .UnsafeInline() .Self(); // due to Blazor builder.AddScriptSrc() .WithNonce() .UnsafeEval() // due to Blazor WASM .StrictDynamic() .OverHttps() .UnsafeInline(); // only a fallback for older browsers }) .RemoveServerHeader() .AddPermissionsPolicy(builder => { builder.AddAccelerometer().None(); builder.AddAutoplay().None(); builder.AddCamera().None(); builder.AddEncryptedMedia().None(); builder.AddFullscreen().All(); builder.AddGeolocation().None(); builder.AddGyroscope().None(); builder.AddMagnetometer().None(); builder.AddMicrophone().None(); builder.AddMidi().None(); builder.AddPayment().None(); builder.AddPictureInPicture().None(); builder.AddSyncXHR().None(); builder.AddUsb().None(); }); if (!isDev) { // maxage = one year in seconds policy.AddStrictTransportSecurityMaxAgeIncludeSubDomains(); } policy.ApplyDocumentHeadersToAllResponses(); return policy; } }

The security headers are added using middleware as early as possible in the pipeline. I add the headers for all requests.

app.UseSecurityHeaders( SecurityHeadersDefinitions.GetHeaderPolicyCollection( app.Environment.IsDevelopment(), app.Configuration["OpenIDConnectSettings:Authority"]));

The CSP nonce can now be used in the Blazor components and scripts can only be read using the nonce. Unsecure scripts or unsecure inline scripts should never be read anywhere in a browser application.

<pre class="wp-block-syntaxhighlighter-code"> <a href="http://_framework/blazor.web.js">http://_framework/blazor.web.js</a> </body> </html> @code { /// <summary> /// Original src: https://github.com/javiercn/BlazorWebNonceService /// </summary> [CascadingParameter] HttpContext Context { get; set; } = default!; protected override void OnInitialized() { var nonce = GetNonce(); if (nonce != null) { BlazorNonceService.SetNonce(nonce); } } public string? GetNonce() { if (Context.Items.TryGetValue("nonce", out var item) && item is string nonce and not null) { return nonce; } return null; } }</pre>

The applications can be started and the authentication and the session protection can be validated. Using the WASM mode in Blazor Web requires a weaker security setup and you need to disable the CSP nonces. This is not a good idea.

Links

https://github.com/dotnet/blazor-samples/tree/main/8.0/BlazorWebAppOidc

Using a CSP nonce in Blazor Web

https://github.com/damienbod/BlazorServerOidc


Simon Willison

Quoting Jason D. Clinton, Anthropic

[On complaints about Claude 3 reduction in quality since launch] The model is stored in a static file and loaded, continuously, across 10s of thousands of identical servers each of which serve each instance of the Claude model. The model file never changes and is immutable once loaded; every shard is loading the same model file running exactly the same software. We haven’t changed the temperature

[On complaints about Claude 3 reduction in quality since launch] The model is stored in a static file and loaded, continuously, across 10s of thousands of identical servers each of which serve each instance of the Claude model. The model file never changes and is immutable once loaded; every shard is loading the same model file running exactly the same software. We haven’t changed the temperature either. We don’t see anywhere where drift could happen. The files are exactly the same as at launch and loaded each time from a frozen pristine copy.

Jason D. Clinton, Anthropic


@_Nat Zone

NHK「最深日本研究~外国人博士の目~」からメタバースとアイデンティティとプライバシーについて考えてみる

4/13(土) 午前8:10-午前8:40にNHKで「最深日本研究~外国人博士の目~」という番組が放映されました。4/20(土) 午前8:39 までNHK+で視聴可能です。スイスの文化人類学者、リュドミラ・プレディキナ博士(31)の文化人類学的見地からの、日本における「メタバース…

4/13(土) 午前8:10-午前8:40にNHKで「最深日本研究~外国人博士の目~」という番組が放映されました。4/20(土) 午前8:39 までNHK+で視聴可能です。スイスの文化人類学者、リュドミラ・プレディキナ博士(31)の文化人類学的見地からの、日本における「メタバース」という地域・文化、とくに男性の大部分がなぜ「バーチャル美少女キャラ〜バ美肉」化するかということへのアプローチです。

この背景として、日本の伝統文化である歌舞伎では女性を男性が演じること、人形のようなキャラクターを演じるということでは、人形浄瑠璃との関係もプレディキナ博士の論文「Babiniku」1では触れられているようです。また、日本における「理想」の実体化としての「カワイイ」の考察などもされています。昨今、アイデンティティとプライバシーの国際標準化の分野でもメタバースが関心の対象になってきているところなので、とても参考になりました。

以下に番組概要のメモと、その後にアイデンティティやプライバシーの観点での考察を記載します。

番組概要

番組の概要はこんな感じです。なぐり書きメモなんで整っていませんが、まぁなんとなくは分かるでしょう。NHK+での番組公開終了までは、ぜひ番組をご覧になることをおすすめします。

導入部 2000年代以降ネット上に広がったもう一つの世界、メタバース。人々はアバターと呼ばれる自らの分身を使い見知らぬ人との出会いを楽しむ。 その中で、日本人コミュニティではバ美肉が目立つ。 スイスの文化人類学者リュドミラ・プレディキナ博士(31)2はこのことに興味を持つ。 なぜ日本人男性は美少女キャラクターになりたがるのか、その答えを求めて研究 アニメ・和食・J-POP日本文化に注がれる世界からの熱い視線。そして、日本を愛しわれわれには無い目線でより深く日本を知ろうとする外国人研究者 彼らに日本はどう写っているか。バーチャル人類学者がわれわれの知らない日本の真の姿を解き明かす〜最深日本研究 第1章:最新メタバース体験施設を調査 スイスにはこのような施設はない メタバースを体験するためのすべてが揃っている 初心者には案内人もついている。やはり美少女キャラクター 一般視聴者に向けたメタバースの紹介 アバターの選択〜好きなアバターになる〜アイデンティティの表出 バーチャル空間では一瞬で何十万もある世界を移動できる そこで景色を楽しんだり、人々とお酒を飲んで交流したりなど、現実世界と同じように楽しめる メタバースの利用者は世界中でおよそ4億人 花火をましたから見上げるなどここでしかできない体験 第2章:日本のバーチャル世界を研究したい 博士がバーチャル世界の研究を始めたのは4年前 研究調査で驚きを持って知った言葉「バ美肉」(バーチャル美少女受肉)男性が美少女キャラを持つときに使われることば 「受肉」には西洋では宗教的な意味があるが、バ美肉たちは宗教に全く関心がない。そこに博士は興味を惹かれた。 調査をすると驚くべきことがわかった。男性のおよそ8割が美少女キャラクターを使っている 女性 78.3% 男性 15.2% その他 6.5% その見た目には大きな特徴:アニメに出るような可愛い女の子。これに対して西洋では、クールでセクシー。カワイイとは違って強い女性という印象を受ける。

(出所)NHK「最深日本研究~外国人博士の目~」

博士:わたしは西洋で育ち、女性として美しくエレガントでありたいと思っていた。なぜ日本の男性はカワイイと言われたいのか。西洋人としてこの異なる視点をもっと理解したいと思います。 文化人類学の紹介 20世紀初頭に一つの学問領域として確立。 クロード・レヴィ・ストロース ルース・ベネディクト 梅棹忠夫 机に向かうだけではなく、現地に赴き直接人と触れあるフィールドワークという手法で人間とは何かを探求する学問 人類学者は世界各地の民族を訪ね、彼らとともに暮らし、彼らの生活をつぶさに観察、自分の属する社会と比較することで、その共通点や違い、多様性を明らかにしてきた バーチャル世界という新たな地平 最初SNSを使いバ美肉たちに話しかけたが、部外者であるわたしは信用されなかった。わたしが西洋人であり女性だったから。そこで、博士は人類学の手法であるフィールドワークを使ってバ美肉たちに接近。自分のアバターを作ることは重要だった。バーチャル世界で彼らの信用を得ることができたから。 アバターを纏うことでバ美肉たちと同じ目線に立ち直接ふれあい始めたリュドミラ ときには一緒にライブ配信を行い、オリジナル曲の英訳を買って出た そして1年ほどかけて、徐々にコミュニティの中に入っていった 第3章 声までカワイイってどういうこと? 両声類3の紹介 声は男性のままか音声変換器を使うのが一般的 しかし、あまちじょんこ(@johnko_amachi)などは自らの超絶技巧で声を変える。 中の人は、じょんアニキ。 めざすもの:天真爛漫で無邪気で、こういう小さい子っているよねという女の子をしたい。 高い声にするだけだと、男の人の高い声になってしまう。声は楽器と一緒なので楽器の形を変えてあげる。そうやって音質を変えて、その上で音の高さを変える。両方が合わさってじょんこの声になる 2年かけ、理想の可愛い声を獲得した 「別の自分になれるってめちゃくちゃ楽しくて。で、なんかそれで、ただ楽しいということを突き詰めていく。天才じゃないから頑張るしかない。」 カワイイ動きのトレーニング。いざやっていむと難しい。ちいさい子の気持ちになって動かないといけない。たとえば、歩くときに右の足を出すと右に重心がよってしまう。これを繰り返すとふらふら見える。このふらふらした動きが、見ている人からしたら守ってあげたいというので可愛く見える。 博士 歩く時にかわいいを意識したことがなかったのでとてもおもしろいと思った 前は日本の男性といえばサラリーマンかオタクという表層的なステレオタイプなイメージしかなかった。しかし、研究をしてみると、日本人男性の多様性や日本文化の寛容性を知りとても面白いということがわかってきた 博士の生い立ちの説明 ロシア生れ。12歳でスイスに移住。フランス語の勉強に日本の漫画。「天使禁猟区」4。今までこのような美しい絵を見たことがなかったので新しい世界が開けた。 27際のときに日本文化を学ぶためにジュネーブ大学の修士課程に。 論文を発表「Babiniku」5 日本の伝統芸能の一つである歌舞伎の女形と同様に男性が美少女キャラクターを演じる類似点 人形浄瑠璃とアバターをあやつるバ美肉の共通性など日本の伝統文化を通してバ美肉を考察した論文は大学から学術賞を受けた 日本の文化はとても興味深く、趣味と実益を兼ねた研究をしたいと考えた ブレンダ・マーフィー教授 「彼女の研究の魅力は西洋で主流ではない”日本文化”に着目している点です」 バーチャル人類学はとても新しい分野であり、彼女はグローバルな文化への挑戦者といえる リュドミラが来日するのはこれで6度目。街に行けば目につくのは美少女キャラクター。看板や飲食店など町のいたるところで見かける。美少女キャラクターは秋葉原だけでなく日本文化で不可欠な存在 アニメや漫画の専門店に立ち寄り、次々に新しいものが生まれる日本のサブカルチャーも博士にとって研究の一環 第4章 美少女キャラに会いたい アイドルVTuber のらきゃっと (@NoraCat0415 ch. 登録者86000人以上)。ファンネーム「ねずみさん」 これを生み出したのが「のらねこP」(@PIMR, X: @VR_Produce_Nora)インタビュー のらきゃっとにガチ恋だと言っているファンとの関係についておしえて のらちゃんがちょっとドジなところもあってでも強くて可愛くてというのはファンがそう思っているから作られたという部分が多くて、ネズミさんたちがのらきゃっとという存在にそうあってほしいという願いのもとに作られたのでファンからの影響はとても強い のらきゃっとがファンから可愛いと言われるとどう感じるか 娘が公園で仲の良い友達と遊んでいて幸せそうだと親が感じるのに似ている のらきゃっとが生まれて人生は変わったか? 今までの仕事に行って帰ってきて家で寝るだけという生活でそんなに友達もいないというような人生から、本当に楽しく話せる友達が10倍以上に増えた。誰かと話したいときに話せるので寂しいと思う瞬間がなくなった かわいい美少女になれるって最高ですね「みんななれる〜」 ファンから求められることで美少女キャラはよりかわいくくなり、その美少女を生み出した彼の生活も豊かにしている。今日いちばん重要な発見だった 第5章 美少女になる理由を知りたい バーチャル美少女とファンとの交流はさらに進化。現実世界のファンが集まる店。友人のバ美肉が出演すると聞き訪ねた。「新宿ねむバー」 バーチャル美少女ねむさん(Youtube @nemchan_nel, X @nemchan_nel )=『メタバース進化論』(技術評論社)の著者で、博士の研究の頼もしい協力者 博士:「直接あって美少女やメタバースとか私達が情熱を注いでいることについてもっと話そう」 ねむさん:「楽しそう。たくさん話すのを楽しみにしてるよ」 博士:「ではリアル世界で」 ねむはプライベート時間の殆どをメタバースで過ごし、専門書6を出版するほどこの世界について考え続けてきた人物。 博士:直接美少女キャラクターになる理由を訊いてみたい 匿名で活動したいというねむさんの希望により映像と音声を加工しています→モザイクの大きさはかなり調整したらしい SNSを通じ知り合ったのは4年前。ねむさんとはメタバースユーザの生活実態を一緒に調査したことも。 博士によるねむさんへのインタビュー 初めてねむになったときどう思ったの? すごく恥ずかしかった。 罪悪感〜違う自分になるというのが悪いことをしているような感覚だった でもだんだんいつもの自分では出てこないような言葉が口から出てきた 美少女になると自己表現も変わるということか? 「現実の自分だとカッコつけちゃうのかな」 「そこまで心揺れ動かない結構冷静な人間だと自分のことを思っていたけど、ねむちゃんのときはそういうのから開放されてもっと自由に表現できる」 あなたにとっての美少女ってどういう存在なのか? “カワイイ”という概念を具現化した存在が美少女キャラクターだと思っている 肩書とか現実世界のしがらみを剥ぎ取ると人間の本質ってすごくカワイイ。魂って実はカワイイと思っていて、カワイイはその人の本質を褒めている気がする。 自信になるというか、ありのままでちゃんと意味があるんだと思える気がする なぜ日本人男性が美少女になりたいと思うのか。その答えを求めてフィールドワークを続けてきた博士の今考える結論 日本人男性は美少女になりカワイイを獲得することでストレスから開放される。 なぜならカワイイは失敗しても許されるし責任を負わなくていいからです。 私は日本文化から生まれたカワイイに可能性を感じました しかし私は人類学者としてコミュニティを一般化するのではなく、これからも多くの人たちの声に耳を傾けたいと思っています

エンド:若き人類学者が日本のバーチャル世界に分け入っていく

考察 アイデンティティとバ美肉

アイデンティティという言葉は番組の中に出て来ませんが、多くの発言は「アイデンティティ」関連だとわたしの中では整理されました。たとえば、じょんこさんの「別の自分になれるってめちゃくちゃ楽しくて。で、なんかそれで、ただ楽しいということを突き詰めていく。」という発言は、現実社会の仕事だとか交友関係という既存のコンテキストを超えた新たなコンテキストでの自我の確立とその表出(自観するアイデンティティの表出)およびそれがコミュニティ内で他の人に観測されるもののフィードバック(他観)という関係性の享受から幸福感を得ているように見えます。(自観と他観のあたりは、「非技術者のためのデジタルアイデンティティ入門」をご参照ください)

類似のことは、ねむさんの「(最初ねむになったとき)すごく恥ずかしかった。罪悪感〜違う自分になるというのが悪いことをしているような感覚だった。でもだんだんいつもの自分では出てこないような言葉が口から出てきた。」「ねむちゃんのときはそういうのから開放されてもっと自由に表現できる。」という言葉にも出てきます。

一方、のらねこPの「のらちゃんがちょっとドジなところもあってでも強くて可愛くてというのはファンがそう思っているから作られたという部分が多くて、ネズミさんたちがのらきゃっとという存在にそうあってほしいという願いのもとに作られたのでファンからの影響はとても強い。」という発言は、自観と他観のズレの最小化を、表出する属性の調整によって他観の方を変えて行うという「通常」のやり方とは逆で、他観の方に自観をあわせてしまうというアプローチに見えます。根源的自我に他観を合わせに行くのではなく、自我の方を調整してしまうわけですね。ただ、これが本当に自分の自我だとかなり心理的負荷が大きくなってくるのではないかと思われます。心理的負荷を減らすには、自分の中からその人格を外だししてしまって客観化してしまうことが必要であるようにも思えます。

このあたりが、のらねこPの「娘が公園で仲の良い友達と遊んでいて幸せそうだと親が感じるのに似ている」という発言からも表出されているように感じられました。

ここの部分で、プレディキナ博士は「ファンから求められることで美少女キャラはよりかわいくくなり、その美少女を生み出した彼の生活も豊かにしている。今日いちばん重要な発見だった」と述べています。

免罪符としては機能しない美少女キャラクターとプライバシー

次に検討してみたいのが、博士の「日本人男性は美少女になりカワイイを獲得することでストレスから開放される。なぜならカワイイは失敗しても許されるし責任を負わなくていいから。」という発言です。わたしの解釈違いかもしれませんが、カワイイが免罪符的に機能するので、日本人男性はカワイイを獲得したいと言っているように聞こえます。「日本人男性は」としているので、この「カワイイ」の概念とその通用する範囲が日本にまず限定されているのだろうということは想像できます。そのうえで「カワイイ」が免罪符的に機能するのかと言うと、これはちょっと難しいところだと思っています。

「カワイイ」を体現するものとしては、いわゆるアイドルVTuberなどがいると思います。ところが、その免罪符を持っているはずの彼女たちは、時として誹謗中傷されたりコミュニティから叩かれ、「卒業」を余儀なくされたりしているのは、このあたりをちょっと追っている人ならよくご存知のことと思います。もし「カワイイ」が免罪符として機能するならばこういうことは起きないと思うのです。

おそらく理想としての「カワイイ」には人々の期待がつまっていて、その期待を裏切ると叩かれるという構図があるのではないかとも思えます。つまり「カワイイ」が免罪符として機能するわけではない。ある意味、「”カワイイ”という概念を具現化した存在としての美少女キャラクター」は現代の巫女であり、古代と同じく、期待を裏切ったときにはスケープゴートにされる存在だと言えるかもしれません。で、わたしはスケープゴートって作ってはいけないと思うんですよね。

現代の巫女がスケープゴート化してしまう原因の多くは、コンテキスト外の情報の当該コンテキストへの混入です。たとえば、誰かからのチャットが画面に映り込んでしまったとか、他の人の声が混入してしまったとか、自分の顔が反射で映り込んでしまったとかですね。今後は他のワールドでの行動が当該ワールドにレポーティングされてしまうというような形の情報混入も起きてくることでしょう。場合によっては悪意の人が監視用のキャラクターを各所に配置して意図的にこれをやってくることも考えられます。中の人と、その人が運用する異なるアバターの名寄せとかもこの中に入ってきます。

番組の中でねむさんは「匿名で活動したい」としてモザイクをかけて声も変えて登場していますが、これも望まない名寄せであるとか情報の混入を防ぐ取り組みの一環と整理することもできます。

こうしたことをどうやって防ぐかとか取り締まっていくかなどが、メタバースにおけるプライバシーの尊重という意味では重要性を増してくると思われます。メタバースプライバシーの国際標準作成作業が始まっている背景でもあります。

しがらみを削ぎ落とした存在としての「ありのままの自分=美少女キャラクター」

さて、ではなぜ日本人男性はバ美肉化するのか?上記のことを勘案すると、博士のいう「日本人男性は美少女になりカワイイを獲得することでストレスから開放される。なぜならカワイイは失敗しても許されるし責任を負わなくていいから。」というのはちょっと違いそうです。

番組内でねむさんは「”カワイイ”という概念を具現化した存在が美少女キャラクターだと思っている。肩書とか現実世界のしがらみを剥ぎ取ると人間の本質ってすごくカワイイ。魂って実はカワイイと思っていて、カワイイはその人の本質を褒めている気がする。自信になるというか、ありのままでちゃんと意味があるんだと思える気がする。」とおっしゃっておられます。現実世界のしがらみを剥ぎ取った人の本質は「カワイイ」であり、そのありのままの自分の自我「自観」を表出すると、日本文化においては美少女になるということでしょうか。そうかもしれません。なにせ、8割近くのひとがそうあろうとするのですから。

番組では、各被取材者ごとに数時間のインタビューがあったのがそれぞれ5分くらいにまとめられてしまっているようです。削ぎ落とされたものの中に、このあたりに光を当ててくれる発言も多くあったことでしょう。その意味で、アフタートークも準備されているようなので、それに期待したいと思います。

【ガチ勢各位】
実際にはみんな2 ~3時間喋った内容をそれぞれ一般人向けに5分くらいにまとめられてるので、ガチ勢のみんなとしたはぶっちゃけ物足りない面もあると思う。という訳で、アフタートーク企画中です!!! #最深日本研究 https://t.co/db9ToxqQpl

— バーチャル美少女ねむ/Nem4/13 NHK「最深日本研究」出演! (@nemchan_nel) April 14, 2024

Sunday, 14. April 2024

Ben Werdmüller

I decided to give Tesla FSD a ...

I decided to give Tesla FSD a second chance today. Aside from the bit where it decided to turn onto an actual municipal train track and use it as a road, it did really well!

I decided to give Tesla FSD a second chance today. Aside from the bit where it decided to turn onto an actual municipal train track and use it as a road, it did really well!


Kent Bull

KERI Update: Major release and Branch Strategy Change

The WebOfTrust community recently released a major update to the Key Event Receipt Infrastructure (KERI) and Authentic Chained Data Containers (ACDC) implementation as a coordinated release across the suite of WebOfTrust repositories. Coordinated Release This resulted in the following release versions: Branch Strategy Change Recent Blog Posts See Nuttawut Kongsuwan’s […]

The WebOfTrust community recently released a major update to the Key Event Receipt Infrastructure (KERI) and Authentic Chained Data Containers (ACDC) implementation as a coordinated release across the suite of WebOfTrust repositories.

Coordinated Release

This resulted in the following release versions:

Repository: KERIpy (used for witnesses, command line-managed decentralized identities, and as the core library for the Python KERY space)
Version: 1.1.12 (PyPi) – Apr 9, 2024 Repository: KERIA (agent server used for individual decentralized identity agents – your data sits here)
Version: 0.2.0.dev0 Pre-Release (PyPi) – Apr 11, 2024 Repository: SignifyPy (edge-signing client implementation – your keys sit here)
Version: 0.1.0 (PyPi) – Feb 13, 2024 Branch Strategy Change “development” branch merged to “main” as the old GitFlow style branching strategy was dropped in favor of trunk-based development (single main branch strategy). See the keripy#726 GitHub discussion for the rationale. This occurred for the following repositories: Python KERIpy KERIA SignifyPy Typescript SignifyTS

Recent Blog Posts

See Nuttawut Kongsuwan’s explanation of how to use KERI in his “The Hitchhiker’s Guide to KERI. Part 3: How do you use KERI?


Simon Willison

redka

redka Anton Zhiyanov's new project to build a subset of Redis (including protocol support) using Go and SQLite. Also works as a Go library. The guts of the SQL implementation are in the internal/sqlx folder. Via @ohmypy

redka

Anton Zhiyanov's new project to build a subset of Redis (including protocol support) using Go and SQLite. Also works as a Go library.

The guts of the SQL implementation are in the internal/sqlx folder.

Via @ohmypy

Saturday, 13. April 2024

Ben Werdmüller

The thing about most war commentary on ...

The thing about most war commentary on social media is that it's speculative, almost on an entertainment level. Every time, there are people who bear the cost of this, who didn't ask for it, who don't endorse it, and yet will still pay an unimaginable price. It's described as points-scoring but it's death and suffering, children and families and innocent human beings, and their descendants, and th

The thing about most war commentary on social media is that it's speculative, almost on an entertainment level. Every time, there are people who bear the cost of this, who didn't ask for it, who don't endorse it, and yet will still pay an unimaginable price. It's described as points-scoring but it's death and suffering, children and families and innocent human beings, and their descendants, and theirs, and so on, for generations. There is no glory, there is no validity. It's sick.


Simon Willison

Lessons after a half-billion GPT tokens

Lessons after a half-billion GPT tokens Ken Kantzer presents some hard-won experience from shipping real features on top of OpenAI's models. They ended up settling on a very basic abstraction over the chat API - mainly to handle automatic retries on a 500 error. No complex wrappers, not even JSON mode or function calling or system prompts. Rather than counting tokens they estimate tokens a

Lessons after a half-billion GPT tokens

Ken Kantzer presents some hard-won experience from shipping real features on top of OpenAI's models.

They ended up settling on a very basic abstraction over the chat API - mainly to handle automatic retries on a 500 error. No complex wrappers, not even JSON mode or function calling or system prompts.

Rather than counting tokens they estimate tokens as 3 times the length in characters, which works well enough.

One challenge they highlight for structured data extraction (one of my favourite use-cases for LLMs): "GPT really cannot give back more than 10 items. Trying to have it give you back 15 items? Maybe it does it 15% of the time."

(Several commenters on Hacker News report success in getting more items back by using numbered keys or sequence IDs in the returned JSON to help the model keep count.)

Via Hacker News


Michael Ruminer

Refocusing a Career

Now for something entirely different. I write only occasionally on a topic of interest to me, verifiable credentials. Today it is something entirely different, my career. Most specifically, on how it may be time for a refocus. A movement from software development life-cycle process back toward a new more technical focus, AI. I have, in some ways, refocused my career a couple of times. It was

Now for something entirely different. I write only occasionally on a topic of interest to me, verifiable credentials. Today it is something entirely different, my career. Most specifically, on how it may be time for a refocus. A movement from software development life-cycle process back toward a new more technical focus, AI.

I have, in some ways, refocused my career a couple of times. It wasn’t a full restart or anything; just moving into new directions within software development. Some were more a natural progressions than a refocus. This time it would certainly be a refocus.

What I do and what I have done?

For the last several years I have been a software technical release manager focused on not only getting software out the door but on process improvement. That is not where I started my career. It all started as a system admin, then a programmer, then getting involved in migrations and software develpment for mitigating Y2K, on to a software team lead. Then came the first refocus, I got deeply invested in a new Microsoft product Team Foundation Server. This took me down a fun rabbit hole where I spent some years consulting on implementations and use of the product. I spent a few years of my consulting as a Micrsoft MVP on the product, which was very rewarding. Ultimately, with a focus on version control, automated builds and deploys, and release processes with the product and in general, I made the next refocus. I moved into release management. Decidely less technical, but something I have now enjoyed for a number of years. I find the cross-functional nature rewarding and plenty of opportunity to still practice process improvement.

Why the consideration of a refocus?

I see it as a dying domain. I have found that the role is late to join organizations and more easily released from those same organizations. In my experience, companies find unexpected value in the role and can really enjoy having the “glue” bring higher quality and improved flow to production and beyond, but “enjoy” is too much of an operative word here. When belts are tightened it is one of the first roles to go — seen as a luxury not a requirement. I also see diminishing new opportunities in the marketplace. All this places downward pressure on compensation. Additionally, upward progression from the role is often difficult. So it may be time for a full refocus, again.

Too technical for the role?

It wasn’t too long ago I was told as a final feedback for a failed pursuit of an open release management position:

You are too technical.

Hmmm… I knew what they meant, but I had never had my quals work against me- not that I know of. Most find it to be a boon. It made me think. With all the considerations of why there might need to be a refocus, is it time to shift my momentum?

Where to next?

I really enjoy what I do and would hate to give it up but the writing might be on the wall. So, as I survey the landscape, I say to myself “what might be next”. Where is the best opportunity to delve into new things and make an impact for organizations? Is it the AI domain? Can I go from zero to hero? What does that look like? How long would it take and what path is there? I see a lot of potential in Microsoft’s Azure AI offerings and the ones that will inevtiably come down the line. Plus, I have a long history in the Microsoft stack. Let’s see where the next few years take me. Regardless of anything, I’ll certainly be digging deeper into the practical applications and implementations of the technology all why continuing my enjoyable current role in software life cycle.

Back to a common topic

Now, if I can come up with an AI realm play that ties in verifiable, portable, credentials. What will be bigger in the future than identity and AI? Nothing.

Friday, 12. April 2024

Simon Willison

How we built JSR

How we built JSR Really interesting deep dive by Luca Casonato into the engineering behind the new JSR alternative JavaScript package registry launched recently by Deno. The backend uses PostgreSQL and a Rust API server hosted on Google Cloud Run. The frontend uses Fresh, Deno's own server-side JavaScript framework which leans heavily in the concept of "islands" - a progressive enhancement

How we built JSR

Really interesting deep dive by Luca Casonato into the engineering behind the new JSR alternative JavaScript package registry launched recently by Deno.

The backend uses PostgreSQL and a Rust API server hosted on Google Cloud Run.

The frontend uses Fresh, Deno's own server-side JavaScript framework which leans heavily in the concept of "islands" - a progressive enhancement technique where pages are rendered on the server and small islands of interactivity are added once the page has loaded.

Via Hacker News


Mike Jones: self-issued

Fully-Specified Algorithms Presentation at 2024 OAuth Security Workshop

I gave a presentation on Fully-Specified Algorithms for JOSE and COSE at the 2024 OAuth Security Workshop in Rome. The slides used to update participants on the progress of the work are available as PowerPoint and PDF. Thanks to the organizers for another great OAuth Security Workshop! And special thanks to the colleagues from Fondazione […]

I gave a presentation on Fully-Specified Algorithms for JOSE and COSE at the 2024 OAuth Security Workshop in Rome. The slides used to update participants on the progress of the work are available as PowerPoint and PDF.

Thanks to the organizers for another great OAuth Security Workshop! And special thanks to the colleagues from Fondazione Bruno Kessler who did a great job with local arrangements in Rome!


COSE “typ” (type) Header Parameter Specification in RFC Editor Queue

I’m pleased to report that the COSE “typ” (type) Header Parameter Specification has been approved by the IESG and is now in the RFC Editor queue. The version approved by the IESG and sent to the RFC Editor is: https://www.ietf.org/archive/id/draft-ietf-cose-typ-header-parameter-05.html It joins CBOR Web Token (CWT) Claims in COSE Headers in the RFC Editor queue. […]

I’m pleased to report that the COSE “typ” (type) Header Parameter Specification has been approved by the IESG and is now in the RFC Editor queue.

The version approved by the IESG and sent to the RFC Editor is:

https://www.ietf.org/archive/id/draft-ietf-cose-typ-header-parameter-05.html

It joins CBOR Web Token (CWT) Claims in COSE Headers in the RFC Editor queue. Because of the reference to this spec by CWT Claims in Headers, they form a cluster, and therefore will become RFCs at the same time.


Simon Willison

Quoting David Pierce

The language issues are indicative of the bigger problem facing the AI Pin, ChatGPT, and frankly, every other AI product out there: you can’t see how it works, so it’s impossible to figure out how to use it. [...] our phones are constant feedback machines — colored buttons telling us what to tap, instant activity every time we touch or pinch or scroll. You can see your options and what happens wh

The language issues are indicative of the bigger problem facing the AI Pin, ChatGPT, and frankly, every other AI product out there: you can’t see how it works, so it’s impossible to figure out how to use it. [...] our phones are constant feedback machines — colored buttons telling us what to tap, instant activity every time we touch or pinch or scroll. You can see your options and what happens when you pick one. With AI, you don’t get any of that. Using the AI Pin feels like wishing on a star: you just close your eyes and hope for the best. Most of the time, nothing happens.

David Pierce

Thursday, 11. April 2024

Simon Willison

3Blue1Brown: Attention in transformers, visually explained

3Blue1Brown: Attention in transformers, visually explained Grant Sanderson publishes animated explainers of mathematical topics on YouTube, to over 6 million subscribers. His latest shows how the attention mechanism in transformers (the algorithm behind most LLMs) works and is by far the clearest explanation I've seen of the topic anywhere. I was intrigued to find out what tool he used to pro

3Blue1Brown: Attention in transformers, visually explained

Grant Sanderson publishes animated explainers of mathematical topics on YouTube, to over 6 million subscribers. His latest shows how the attention mechanism in transformers (the algorithm behind most LLMs) works and is by far the clearest explanation I've seen of the topic anywhere.

I was intrigued to find out what tool he used to produce the visualizations. It turns out Grant built his own open source Python animation library, manim, to enable his YouTube work.


Patrick Breyer

EU-Parlament stimmt für Verlängerung der flächendeckenden Chatkontrolle durch US-Internetkonzerne

Gestern Abend haben die Abgeordneten des EU-Parlaments mehrheitlich für die Verlängerung der umstrittenen flächendeckenden freiwilligen Chatkontrolle 1.0 durch US-Internetkonzerne wie Meta (Instagram, Facebook), Google (GMail) und Microsoft (X-Box) bis April 2026 …

Gestern Abend haben die Abgeordneten des EU-Parlaments mehrheitlich für die Verlängerung der umstrittenen flächendeckenden freiwilligen Chatkontrolle 1.0 durch US-Internetkonzerne wie Meta (Instagram, Facebook), Google (GMail) und Microsoft (X-Box) bis April 2026 gestimmt. Von den deutschen EU-Abgeordneten unterstützten Union und SPD die Verlängerung, während Piraten, Grüne, FDP, Linke sie ablehnten und die AfD sich enthielt.

Der Europaabgeordnete der Piratenpartei und digitale Freiheitskämpfer Dr. Patrick Breyer, der gegen die eigenmächtige Chatkontrolle vor Gericht klagt, kritisiert: 


„Das EU-Parlament will von den grundrechtswidrigen flächendeckenden Chatkontrollen weg, mit dem gestrigen Beschluss zementiert es sie aber für Jahre. Das EU-Parlament will einen viel besseren und gerichtsfesten Schutz vor Kindesmissbrauch im Netz, mit dem gestrigen Beschluss wird aber überhaupt nichts zum besseren Schutz unserer Kinder erreicht. Mit so wenig Rückgrat werden immer weitere Verlängerungen des Status Quo der Massenüberwachung folgen und ein besserer Schutz von Kindern immer unwahrscheinlicher. Missbrauchsopfer haben besseres verdient! 



EU-Kommission, EU-Regierungen und einem internationalen überwachungsbehördlich-industriellen Netzwerk ist es leider gelungen, der Parlamentsmehrheit Angst vor einer vermeintlichen ‚Schutzlücke‘ durch Wegfall der flächendeckenden freiwilligen Chatkontrolle 1.0 zu machen. In Wahrheit leistet die freiwillige Massenüberwachung unserer persönlichen Nachrichten und Fotos durch US-Dienste wie Meta, Google oder Microsoft keinen signifikanten Beitrag zur Rettung missbrauchter Kinder oder Überführung von Missbrauchtätern, sondern kriminalisiert umgekehrt tausende Minderjähriger, überlastet Strafverfolger und öffnet einer willkürlichen Privatjustiz der Internetkonzerne Tür und Tor. Wenn nach eigenen Angaben der EU-Kommission nur jede vierte Meldung überhaupt für die Polizei relevant ist, bedeutet das für Deutschland Jahr für Jahr 75.000 ausgeleitete intime Strandfotos und Nacktbilder, die bei unbekannten Moderatoren im Ausland nicht sicher sind und in deren Händen nichts zu suchen haben.

Die Verordnung zur freiwilligen Chatkontrolle ist sowohl unnötig als auch grundrechtswidrig: Die sozialen Netzwerke als Hostingdienste brauchen zur Überprüfung öffentlicher Posts keine Verordnung. Dasselbe gilt für Verdachtsmeldungen durch Nutzer. Und die fehleranfälligen automatisierten Meldungen aus der Durchleuchtung privater Kommunikation durch Zuckerbergs Meta-Konzern, die 80% der Chatmeldungen ausmachen, werden durch die angekündigte Einführung von Ende-zu-Ende-Verschlüsselung ohnehin im Laufe des Jahres entfallen.

Als Pirat arbeite ich daran, die eigenmächtige Chatkontrolle als verdachtslose und flächendeckende Überwachungsmaßnahme vor Gericht stoppen zu lassen. Gleichzeitig werden wir Piraten im EU-Parlament gegen alle Versuche kämpfen, doch noch Mehrheiten im EU-Rat für die extreme Dystopie verpflichtender Chatkontrolle 2.0 zur Zerstörung des digitalen Briefgeheimnisses und sicherer Verschlüsselung zu finden.“

Breyer kündigte gestern an, gegen die erstinstanzliche, formal begründete Abweisung seiner Klage gegen die Chatkontrolle in Berufung zu gehen. Unterdessen wollen die EU-Regierungen bis Juni darüber entscheiden, die Chatkontrolle für alle Dienste verpflichtend zu machen, selbst für bisher sicher Ende-zu-Ende-verschlüsselte Messengerdienste.


Simon Willison

Use an llm to automagically generate meaningful git commit messages

Use an llm to automagically generate meaningful git commit messages Neat, thoroughly documented recipe by Harper Reed using my LLM CLI tool as part of a scheme for if you're feeling too lazy to write a commit message - it uses a prepare-commit-msg Git hook which runs any time you commit without a message and pipes your changes to a model along with a custom system prompt.

Use an llm to automagically generate meaningful git commit messages

Neat, thoroughly documented recipe by Harper Reed using my LLM CLI tool as part of a scheme for if you're feeling too lazy to write a commit message - it uses a prepare-commit-msg Git hook which runs any time you commit without a message and pipes your changes to a model along with a custom system prompt.


Quoting Andrej Karpathy

[on GitHub Copilot] It’s like insisting to walk when you can take a bike. It gets the hard things wrong but all the easy things right, very helpful and much faster. You have to learn what it can and can’t do. — Andrej Karpathy

[on GitHub Copilot] It’s like insisting to walk when you can take a bike. It gets the hard things wrong but all the easy things right, very helpful and much faster. You have to learn what it can and can’t do.

Andrej Karpathy

Wednesday, 10. April 2024

Simon Willison

Shell History Is Your Best Productivity Tool

Shell History Is Your Best Productivity Tool Martin Heinz drops a wealth of knowledge about ways to configure zsh (the default shell on macOS these days) to get better utility from your shell history. Via lobste.rs

Shell History Is Your Best Productivity Tool

Martin Heinz drops a wealth of knowledge about ways to configure zsh (the default shell on macOS these days) to get better utility from your shell history.

Via lobste.rs


Notes on how to use LLMs in your product

Notes on how to use LLMs in your product A whole bunch of useful observations from Will Larson here. I love his focus on the key characteristic of LLMs that "you cannot know whether a given response is accurate", nor can you calculate a dependable confidence score for a response - and as a result you need to either "accept potential inaccuracies (which makes sense in many cases, humans are wrong

Notes on how to use LLMs in your product

A whole bunch of useful observations from Will Larson here. I love his focus on the key characteristic of LLMs that "you cannot know whether a given response is accurate", nor can you calculate a dependable confidence score for a response - and as a result you need to either "accept potential inaccuracies (which makes sense in many cases, humans are wrong sometimes too) or keep a Human-in-the-Loop (HITL) to validate the response."


Quoting Will Larson

The challenge [with RAG] is that most corner-cutting solutions look like they’re working on small datasets while letting you pretend that things like search relevance don’t matter, while in reality relevance significantly impacts quality of responses when you move beyond prototyping (whether they’re literally search relevance or are better tuned SQL queries to retrieve more appropriate rows). Thi

The challenge [with RAG] is that most corner-cutting solutions look like they’re working on small datasets while letting you pretend that things like search relevance don’t matter, while in reality relevance significantly impacts quality of responses when you move beyond prototyping (whether they’re literally search relevance or are better tuned SQL queries to retrieve more appropriate rows). This creates a false expectation of how the prototype will translate into a production capability, with all the predictable consequences: underestimating timelines, poor production behavior/performance, etc.

Will Larson


David Kelts on ID

“Ripping IDs” Pt 2: The Unwinnable Cost Problem of Identity Verification

In a story on “Ripping IDs Should Go the Way of CDs”, I wrote about the questionable accuracy, usability, and privacy of “ripping” analog IDs into the digital space — a/k/a Mobile Know Your Customer (mKYC). Now I‘ll analyze the “ $1 cost problem” and you’ll see why important steps in mKYC are being skipped. Analog to Digital. Cost, accuracy, usability, and privacy problems of Mobile ID Verifi
In a story on “Ripping IDs Should Go the Way of CDs”, I wrote about the questionable accuracy, usability, and privacy of “ripping” analog IDs into the digital space — a/k/a Mobile Know Your Customer (mKYC). Now I‘ll analyze the “ $1 cost problem” and you’ll see why important steps in mKYC are being skipped. Analog to Digital. Cost, accuracy, usability, and privacy problems of Mobile ID Verification (image cidaas.com)

Services that require identity verification have become increasingly dependent on “scan the front, scan the back, take a selfie, match” for everything from renting a sidewalk scooter to opening bank accounts to provisioning your mobile driver’s license. The cost of inaccuracy in each situation varies greatly. The practical amount of money to spend within these use cases doesn’t — from 0 to 25 cents. Yet most services cost $1. Startups eat the difference hoping the future cost will go down (sorry, it won’t). Many businesses eliminate steps to keep costs down but still roll out a high-demand feature. Avoid offering mobile services that depend on ID and you fall behind. Do it, and accept the risk of fraudulent inputs.

There are both financial risk mitigation use cases and government identity use cases that are currently utilizing mKYC. I’d like to separate these concepts (read to the end to see why).

Note: I do not use the overloaded term document verification. I specifically split document verification into two distinct steps: document authentication (is it from a trusted issuer) and document validation (is the data on it accurate and the account behind it still valid). (See also definitions NIST 800–63). Both of these would be resolved with digital government-signed identity documents.
Cost Analysis of mKYC
Note: I invite those with more insider knowledge of the costs of IDV to connect with me to correct my educated summary of costs below.

There are at least 10 mKYC issues that negatively impact user experience, accuracy, and privacy. mKYC is a technology where we haven’t yet been able to achieve Privacy by Design Principle #4 of full functionality without zero-sum tradeoffs. It’s still cost vs. accuracy vs. user experience vs. privacy (spoiler alert: cost wins). Doing analog-to-digital, this won’t ever change.

The Document Authentication Cost Problem
Is the physical document real or fake? How much does it cost to accurately know?

The cost problem here is simply explained. In physical documents, the goal of winning the card security feature race against fraudsters was to make it expensive, in dollars and knowledge, to produce fake IDs. So cards have proprietary visual, tool-based (8x magnify, multi-spectral lighting), and forensic (destroy the card to detect) security features built into their structure, printing, and lamination. Digitally created for analog use... At great expense, including admirable IP, layered over time… but the fraudsters still caught up, able to replicate lower-cost visual features (and therefore make them acceptable by every bar bouncer on earth).

This gave rise to the supremacy of tool-based authentication for high-risk use cases — use an 8x loupe magnifier; combine white, UV, and IR lights (multi-spectral) to detect secrets built or printed into the card; purposeful design flaws, raised surfaces… amazing technology! Until you try to scan and authenticate it with a white-light phone camera held by a human hand.

Now We Expect to Go Back from Analog to Digital?

Try to ingest a secure physical card back into the digital realm and you must try to detect all three levels of physical card security features with a hand-held camera designed for action photos and portraits taken at focal lengths much further away than human arm length. There’s no UV or IR light, no optical magnification, very few steady cams, and no foolproof UX coaching for a person who isn’t aware of what’s being detected.

Do you see where this is taking us? Massive cost, diminishing return, user friction, bad experience… dropoff. What do mKYC purchasers do? Turn features off to avoid paying for document authentication and user friction.

The Document Validation Cost Problem

Once you’ve scanned an ID document, extracted the data from the front with OCR, decoded the barcode from the back, and skipped sending it for document authentication (lol), you’re still left with a few questions…

Is the document still valid? Is the identity data accurate? Are the attributes fresh or have they changed? Can I ask the authoritative source to validate them? How much does it cost me to perform that document data validation?

If you want to establish the provenance, accuracy, and freshness of the data, you need to connect to the source that mitigates your risk level.

At the first level of reliability are “credible sources”. For example, data aggregators charge dimes and quarters for the service of validating data you collected against what they algorithmically believe to be accurate. It’s due diligence on your part. Whether you consider that credible depends on your level of risk versus the cost of service.

At the higher level of reliability are “authoritative sources” — the official issuers of identity documents. Who can connect? Mostly government and law enforcement. You can pay a quarter or two per record to validate data in privacy-preserving manners — yes/no on the accuracy of what you have versus looking up someone’s identity data using an identifier.

See the problem? It costs more just to validate than businesses should logically spend on the whole identity problem. And most don’t have access.

Then there’s the “use at your own risk” level of reliability of unofficial sources (not NIST-defined, surprisingly). The problem with using unofficial sources is that all our data was breached. Assume that to be true — zero trust. Check your mailbox for legal breach notices — confirmed. It’s cheaper to buy that data from the dark web than to use either a credible or authoritative source (if available to you). Advantage fraudsters… and the reason fraudsters can pass Knowledge-Based Authentication more easily than our memories. Skip KBA as an alternative (please).

What’s the point? Eh, given the cost, you’ll skip the document validation step. And you skipped document authentication. So you’re left with having snapped an off-angle photo of that ID document so that you can crop the low-resolution face out of it (resplendent with holograms, security lines, watermarking, and reflective lamination) for biometric matching. Better set that False Accept Rate high enough to get a decent Match Score!

Or you can just produce digital documents in the first place. (See #3 below)
Cost Consolidation

From what I see, there are some alternatives for us to start to win-win the zero-sum problem we are facing.

1. Do this Once at High Cost/Accuracy and Reuse it at Lower Cost

If one analog-to-digital event can be run at a high level of identity assurance(IAL2 minimum) by an authoritative entity, and the event can be re-used, then we start to lower the repeated costs to all future entities. Perhaps then the cost-benefit ratio will make sense. Note that I said an “authoritative” entity and not a credible or trusted entity. I like banks. Banks act in trustworthy ways, perform due diligence when required, and have been willing to step to the forefront in many regions. But if you require a “government-issued ID” for your use case, bank KYC isn’t that. Bank KYC is an Identity Verification not a government signed ID.

A reusable identity solution should be a government-issued, privacy-protecting credential with user authentication and selective disclosure. You might look at mDL as this, however, there’s no standardized mechanism for user authentication yet in ISO/IEC 18013–5 or -7. Even without built-in user authentication, ingesting a digital mDL will short-circuit dozens of my list of 10 problems (pun intended), increase accuracy, and lower cost while allowing fewer fraudulent documents through. That’s why NIST wants to solve this first through the NCCoE.

2. Separate Financial Risk Mitigation from Official Identity

Why this hasn’t happened yet confuses me. Yet we still call it “identity fraud” when criminals are using freely available identity data to perform “financial fraud”.

But seriously, scooter rentals don’t need my ID document, they need to know I can or will pay. Keep going and we will realize the ACLU’s dystopian over-identification state (link will download PDF). None of us want that.

The solution Financial Services needs to look at is reusable user authentication against the KYC they or their partners already performed… without sharing identity attributes that won’t mitigate financial risk.

3. Issue Digitally-Signed Originals

Better yet, just skip the analog to digital steps. Issue and accept digital birth certificates. Issue and accept Mobile or digital ID documents. If you’re not yet accepting digital mDLs, start.

People in the (self-labeled) Self-Sovereign or Decentralized Identity spaces have created the nearly ideal architecture for issuing authoritatively signed, lightweight, digital birth certificates. So far, to my knowledge, only Singapore has jumped on this technology and is realizing the cost savings and accuracy. There’s still one privacy hole to close… whoever controls the distributed ledger can potentially see all authentication and validation requests against the ledger if they control the API layer. This is a solvable technical problem with the architectural concepts of blinding, brokerage, or anonymizing API access (rotating relying party identifiers). This is solvable with policy (link will open video) if not law.

4. Put the User/Holder in Control of their Credentials and Data

When you provide the user (aka citizen, resident, digital subject, holder) with their document in digital form, they gain control over how they share it. EU Digital Identity Wallets are making progress. More importantly, the ease of sharing means that the recipient (relying party) doesn’t need to store identity attributes to mitigate their risk. Ask, verify, dispose. This is a concept of Interactive Information Privacy(TM) that I’ll write about more in the future and that I presented at Cloud Identity Summit in 2017.

Don’t forget to put your reusable identity-proofing credential in that wallet.
Call to Action

We want to stop “ripping IDs” just like we stopped ripping CDs when digital content services became ubiquitous. Digital Identity Services are the next frontier, but everybody is trying to solve everything everywhere all at once. I laid out 4 pathways above, each with existing possible solutions and a legitimate business model to make them self-sustaining. Let’s create actionable, cost-effective solutions to our “identity fraud” problems. The architects have done decent jobs… now the product, market, and business sides should capitalize in better ways than we are currently employing.


Just a Theory

Mini Summit Three

A rough transcript of my Devrim Gündüz’s talk, “Overview of {yum,zypp}.postgresql.org, and the challenges around RPMifying extensions”, along with interspersed chat comments.

Terrific presentation and discussion at last week’s Postgres Extension Ecosystem Mini-Summit. In fact, I later learned that some people couldn’t make it because the Eventbrite event page showed it as sold out!

This is a limitation of the free level at Eventbrite, which caps tickets at 25. But this is a fully-remote event and we can host more people than that. We had over 30 last week! So if you’d like to attend one of the remaining three events and Eventbrite shows it as sold out, have a look at the bookmarks for the #extensions channel on the Postgres Slack, or email the username david at this domain and I will send you the link!

Okay, back to business. Last week, Devrim Gündüz kindly gave a super thorough presentation in his capacity as the maintainer of the Postgres community Yum and ZYpp repositories. This topic sparked lots of interesting discussion and commentary, which I detail below. But why read when you can look?

Video PDF Slides

Still reading? Okay then, here we go!

Introduction I opened the meeting and introduced Devrim Gündüz. Presentation

Thank you for the introduction. Going to talk about Yum and ZYpp dot postgresql.org, these are the challenge for us, and the challenge of making RPMs from extensions.

Work for EDB as Postgres expert, Postgres Major contributor responsible for the Postgres Yum repo. If you’re using RPMs, I’m building them. I also contribute to Rocky, SUSE, and Fedora Linux, proud Postgres community member, live in London, and I’m also doing some DJing, so if I get fired I have an extra job to do.

Agenda: Last year at PGConf.eu in Prague, I had 60 slides for 5 minutes, so expect huge number of slides for today. I want to leave some time for discussion discussion and questions.

I’m going to start with how it started back in the 2000s, talk about the contents of the repos, which distributions we t because that’s another challenge um how do we do the extension packaging how to RPM how to build RPMs of an extension and how to update an extension RPM.

Then the horror story um when what happens when there’s a new Postgres beta is out, which is due in the next next month or so for Postgres 17.

Then we are going to speak about the extension maintainers, problem for us, and then the problems about relying on the external repos.

So if you have any questions please ask as soon as you can. I may not be able to follow the chat during the presentation, but I’ll try as much as I can.

So let’s talk about how it started. When I started using Linux in 1996 um and Postgres in 1998, we always had Tom Lane and we had we had Lamar for for who build built RPMs for RedHta Upstream. So they were just building the distribution packages, nothing community. It was only specific to RedHat — not RedHat Enterprise Linux but RedHat 3-4-5 — not the non-enterprise version of RedHat back then, but it was called it Fedora core back then, the first version was released in November 2003, which was another challenge for packaging

One of the problems with the distribution packaging was that it was slightly behind the minor Postgres releases, sometimes major post releases

So that was one single Postgres version available for a given distro, say Postgres 6.4 or something, 7.0, and multiple versions were not available back then, and the minor version was slightly behind.

I started building packages for my laptop because I wanted to use Postgres and not all versions wer available back then. So I started building packages for my laptop and my server. They were based on the packaging of Tom Lane and Lamar.

Then I uploaded them to my personal server and emailed the PG mailing lists lists and said, “I’m running on own packages, use at your own risk. This is just a rebuild of the upstream packages on the RedHat version that I’m using or the Fedora version that I’m using. Up to you! This is a personal project, nothing serious.”

So then then people started downloading them, and using them. There was no signature, nothing official back then. Then Lamar said he didn’t have enough time for the RPMs. He sent an email to the mailing lists and I said Devrim is stepping up to the plate, and I did it. So that was I think in 2004, about which is about 20 years ago.

So 19 years ago we had the first domain, postgresql.rpm.org, and then we had more packages. In 2007 we had the first repository RPM and then we had yum.postgresql.org. This means that, starting 2007, this began to be the official RPM repo of the community, which which was a good thing because we could control everything under the Community.

I was living in Canada back then. We had the first RPM build farm — instead of using my laptop and my VMs — we had the actual machine back then.

In 2010 we had what was then called multiple postmaster support, which means the parallel installation of the Postgres RPMs. That was a that was a revolution because even still, the current Fedora or RedHat RPMs cannot be installed in parallel. So if you want to install post 13, 14, 15, 16, and 17 or whatever, it is available in the community repo. This is a great feature because you may want to test or do something else. This is how we how we started, 14 years ago we had this feature in the community repo.

Then next year we moved the repo to the community servers and unified the spec files. Our builds are extremely simple — like you can start your own builds in less than 10 minutes: five minutes for pulling the git repo and then a few minutes for for a package. Extremely simple builds, and now we are in 2024

Let’s talk about the contents, because we have four different contents for the repo. The first one is the “common” repo. We call it “common” in the repository config file. It has the RPMs that work across all Postgres major releases. Let’s say pg_badger, or the ODBC driver, JDBC driver, or GDAL for PostGIS. There’s lots of stuff: Python packages, which are not extensions but we like Patroni, which actually works for all Postgres releases.

This was an effort to get rid of duplicates in the Postgres repo. I think we shaved lots of gigabytes back then, and we still save a lot

Then, to address one of the topics of today’s talks, we have the “non-common” RPMs. (These are the name of the directories, by the way.) They are the RPMs which are Postgres version-specific, basically they are extensions. Say plpgsql_check 15 or whatever. Lots of extensions are there.

Then we have extras. They are not actually Postgres packages, they shouldn’t be included in our repo by default, but many people actually look for these packages because they they want to use Patroni and they don’t have the supporting RPMs or supporting RTM RPMs, or they’re not up-to-date.

I’m building a console, console-template, ETCD, HAProxy keepalived and vip-manager. They are all open source, some of them are Go packages — which, actually, I don’t build them, I just distribute the precompiled binaries via the repo. So that makes easier for people to deploy the packages.

And then we have the “non-free” repo. These are the packages that depend on closed-source software like Oracle libraries, or that have license restrictions. For example, ora2pg depends on Perl DBD::Oracle, oracle_fdw depends on Oracle, pg_storm depends on Cuda Nvidia stuff, timescaledb-tsl actually is Timescale DB with the TSL license, informix_fdw and db2_ftw.

So we have some non-free packages which actually depend on non-free stuff as well. All of them are well-maintained: I’m trying to keep everything up to date — like real up-to-date! That brings some problems but we will get there.

We support RedHat Enterprise Linux and of course Rocky Linux. This year we started supporting Alma Linux as well. Of course they are more or less identical, but we test them, install, and support to verify the packages in these three distributions.

We have x86_64 aarchm64, ppc64le, and RedHat 9, 8, and 7. We have also RedHat 6 for Postgres 12, but it’s going to be killed by the end of this year. We have Fedora, but only two major releases, which matches the Fedora lifecycle. And SUSE — my pain in the neck — um I’ll get there.

Since you all are here for extension packaging, let’s get there: what happens for extension packaging.

First of all, we have the first extension, which is the in-core extensions. They are delivered with the contrib sub-package, which matches the directory name in The Postgres tarball. There are separate packages for each major version, so postgres15-contrib, postgres13-contrib, etc. These are the installation directories for each extension. We are going to use those directories for the other [non-cre] extensions as well.

When we add a new extension to the repo, it’s going to use these directories if they have a binary or if they an extension config file, if the library or the mem files — all are going are all installed under these directories. This magic is done by PGXS, which has been there forever. We just provide initial stuff and then the rest is done by the the PGXS magic. This is the base for a lot of the core extensions.

So what happens when we do non-common and non-free package? First of all, they are distributed separately for each Postgres major version. Let’s go back to the one of the extensions, say plpgsql_check. We have a separate package for Postgres 14, a separate package for Postgres 15, 16, 13, and 12. If they build against all the supported Postgres versions, we have separate packages for each.

Of course from time to time — as far as I remember Timescale DB does this —- Timescale DB only supports Postgres 15 and 16 nowadays. So we drop support for the older versions in the Git repo. But they are separate packages; they are all installed in these directories along with the main contrib package.

This is the naming convention that I use: $extensionName_PGMajorVersion. Four or six years ago, some of the packages didn’t have an underscore before the PG major version. It was a problem, so someone complained in the hackers mailing list, and then I made the changes.

Currently all the previous and all the new packages have this naming convention except a few packages like PostGIS — because in PostGIS we have multiple versions of the same extension. let’s say we have PostGIS 3.3, 3.2, 3.4, 3.1, 3.0. We have combinations of each — I think we have PostGIS 3.0 in some of the distributions but mostly PostGIS 3.1, 3.2, 3.3, and 3.4, and then all the Postgres versions they support — A lot of builds! So there are some exceptions where we have the extension name and extension major version before the Postgres major version.

Jeremy S in chat: question: curious if you have thoughts about relocatable rpm packages. ever tried? just best to avoid?

I have a question from Jeremy. This is a very basic question to answer. This is actually forbidden by the packaging rules. The RPM packaging rules forbid you to distribute or create relocatable RPM packages. We we stick to the packaging guidelines, so this this cannot be done.

Jeremy S in chat: Thanks! (Didn’t realize packaging guidelines forbid this)

Let’s talk about how we build extensions. Often our develop package is enough: many of our packages just just rely on on Postgres itself. But of course packages like PostGIS may depend on some other packages, like GDAL, which requires lots of lots of extra dependencies as well. The most problematic one is the GIS Stack: they need EPEL on RHEL and RHEL and its derivatives.

There there has been a discussion in the past about whether should require EPEL by default. The answer is “no,” because not all not all of our users are installing, for example, the GIS stack or other packages. Most of our users — not the majority of our users —- rely on the um rely on just our repo.

On the other hand, in order to provide completeness for our users, I added lots of python packages in the past to support Patroni — because the upstream packages (I’m sorry not maybe upstream packages) were not enough. The version wasn’t enough or maybe too low. So From some time to time I add non Postgres-related packages to the repo just to support the Postgres package. In the past it was PGAdmin, but now it’s not in our repo so it’s not a problem: their upstream is building their own RPMs, which is a good thing for us. We are building extra packages mostly for Patroni.

However, this is a potential problem for some enterprise users because large companies don’t even want to use the EPEL repo because they feel like it’s like it’s not a community repo, but a community repo controlled by Fedora and RedHat. That’s why from time to time I try to add some of the packages to our repo.

If it’s a problem for enterprise users, does it mean we should we maintain tons of extra packages in the EPEL repo for the GIS stack? The answer is “no, definitely no”. Not just because of the human power that we need to maintain those those packages — I mean rebuilding them is easy: I just get the source RPM, commit the spec files into our repo, and rebuild them. But maintaining them is something else.

We will have a similar problem when we release our own ICU package in the next few years. Because, now that we have the in core collation — but just for C Locale —- and people are trying to get rid of glibc, maybe we should have an ICU package. But maintaining an ICU Library across a single Postgres major version is a real big challenge that I don’t know how to solve for now, at least.

And then SLES — my pain in the neck — the GIS stack requires lots of repos on SLES 15. They are they are well documented on on our website.

Fedora is safe because Fedora is Fedora, everything is there, it’s easy to get a package there.

Jeremy S in chat: “Building them is easy. Maintaining them is something else.”

Yeah that’s that’s the problem, Jeremy.

So how do you RPMify an extension?

The first thing is to know that the extension exists. This is one of the big problems between developers and users. The developer creates a useful extension and then they don’t create a tarball, they don’t release anything. They expect people to install Git on their production databases and git pull, install make, gcc, all the development libraries and build a binary, blah, blah blah.

I’m sorry that’ss not going to work. It also doesn’t work for pip — pip is not a package manager, it just destroys your server. It downloads things to random places and then everything’s gone. That’s why I added lots of Python packages to support Patroni, because most of the users use the packaging package manager to install Postgres and other packages to their servers. It’s the same for Debian, Ubuntu, for RPMs, for Windows, for macOS.

So first of all we know have to know that the extension exists and we have to have a tallbal. If I see an extension that seems good enough I’ll get there. PGXN is a good place, because when I go to pgxn.org a few times per day and see if there is a new version of an extension or if there’s new extension, it’s a good piece. But there’s a problem: we have hundreds of extensions — maybe thousands — but not all of them are on PGXN. They should be!

David Wheeler (he/him) in chat: You should follow @pgxn@botsin.space for new releases :-)

nils in chat: pg_xz incoming

I don’t know how to solve this problem, but we should expect every extension to announce their extensions on PGXN. I’m not just talking about installing everything through PGXN, but at least have an entry that there’s a new extension, this is repo, the website, the readme and the is a tarball. It doesn’t have to be on PGXN, as long as we have something.

And then I check the version. If there is an extension that will kill your database and the version is 0.001, that’s not going to be added to the repo, because we don’t want to distribute an experimental feature.

David Wheeler (he/him) in chat: LOL, all my extensions start at 0.0.1

David G. Johnston in chat: If it isn’t on PGXN it doesn’t exist.

Another problem is that lots of people write extensions but some of them are just garbage. I’m sorry but that’s the truth. I mean they just release a version and then do nothing.

David Wheeler (he/him) in chat: LOLOLOL

From the chat, “pgxn_xz is coming”: that’s right! We have blackhole_fdw, which was written by Andrew Dunstan. When you create blackhole_fdw, it throws all of your data into black hole, so everything is gone.

Yeah, “if it’s not on PGXN it doesn’t exist,” that’s what I hope we achieve achieve in the next year or so.

Jimmy Angelakos in chat, replying to “If it isn’t on PGXN …”: I would correct that to “If it isn’t on PGXN it isn’t maintained.” Sometimes even ON PGXN…

Yeah Jimmy, that’s one of the big problems that we have: maintenance.

We create the spec file, just copy one of the existing ones and start editing. It’s easy but sometimes we have to add patches. We used to carry patches for each Postgres major version to change the Makefiles for the specific Postgres major version. But I realized that it was [not a great pattern]. Now we just export the path, which fixes the problem.

David G. Johnston in chat: As a policy though, someone who cares and wants responsibility needs to apply to be part of the RPM community.

Then I initiate a scratch build for any missing requirements. If there are any missing build requirements it fails to build. I only do it on Fedora latest, not for every package because it doesn’t always help because some distros may not have the missing dependency

Alvaro Hernandez in chat: Hi everybody!

David G. Johnston in chat: Delegated to PGXN for that directory.

Let’s say we rely on some really good feature that comes with a latest version of something, but that latest version may not appear in RedHat 7 or 8. So some dist dros may have it, but the version may be lower than required. Or some distros may have the dependency under different name. Now in the spec file we have “if SUSE then this” and “if RedHat then this” “if Fedora then”, “if RedHat nine then this”, etc. That’s okay, it’s expected. As long as we have the package, I don’t care.

Then I push it to the Git repo, which I use not just for the spec files and patches, but also for carrying the spec files and pitches to the build instances.

Jorge in chat: How to handle extension versioning properly? I mean, in the control file the version could be anything, like for ex. citus default_version = ‘12.2-1’ where the “published version” is v12.1.2, then the “default version” could remain there forever.

Also have seen in the wild extensions that the control file have a version 0.1 (forever) and the “released” version is 1.5

If something fails I go back to the drawing board. GCC may fail (gcc 14 has been released on Fedora 40 and is causing lots of issues for for packaging nowadays), it could be cmake — too recent or too old. It could be LLVM — LLVM18 is a problem for Postgres nowadays. I either try to fix it ping upstream. I often ping upstream because the issue must be fixed anyway

If everything is okay, just push the packages to the repo.

Ruohang Feng (Vonng) in chat: question: how about adding some good extensions written in Rust/pgrx to the repo? like pgml, pg_bm25, pg_analytics, pg_graphql….

One issues is that there is no proper announcement. Maybe I have an awesome extension available in the Postgres repo that people crave and, we build the extensions, it took a lot of time (thank you Jimmy, he helped me a lot) and then I didn’t actually announce it that much. On the other hand, people just can use PG stat base [?] to install and start using it in a few seconds. This is something that we should improve.

Steven Miller in chat: How to handle system dependencies like libc versions being updated on the target system? Do extensions need to be complied against exactly a specific libc version?

From Nevzat in chat: how can we make sure bugfix or extension is safe before installing it

vrmiguel in chat: Interesting approach to identify build/runtime requirements

Over at Tembo we have a project called trunk-packager which attempts to automatically create .deb packages given the extension’s shared object.

We try to identify the dynamic libs the extension requires by parsing its ELF and then trying to map the required .so to the Debian package that provides it, saving this info in the .deb’s control file

From the chat: How to handle extension versions properly? That’s a good thing but, extension version and the release version don’t have to match. Thr extension version isn’t the same thing as the release version. It’s the version of the SQL file or the functions or the tables, the views, sort procedures, or whatever. If it’s 0.1 it means it’s 0.1 it means nothing nothing has changed in this specific regarding the control file. They they may bump up the package version because they may add new features, but if they don’t add new features to the SQL file, then they don’t update the extensions. I hope that answers your question George

I have another question from Ruohang. Yaaaaay! I was afraid that someone would ask that one. We have no extensions written in Rust in repo so far. It’s not like Go; there is a ban against Go because we don’t want to download the world, all the internet just to build an extension. If I recall correctly they’re rewriting pg_anonymizer in Rust. They will let me know when they release it or they’re ready to release it, and then I’ll build it. It’s not something I don’t like, it just hasn’t happened.

Keith Fiske in chat: I still update the control file with my extensions even if it’s only a library change. Makes it easier to know what version is installed from within the database, not just looking at the package version (which may not be accessible)

Ruohang Feng (Vonng) inchat: question: How to handle RPM and extension name conflicts, e.g., Hydra’s columnar and Citus’s columnar.

*David Wheeler (he/him) in chat, replying to “I still update the c…” Yeah I’ve been shifting to this pattern, it’s too confusing otherwise

If you think there are good extensions like these, just create a ticket on redmine.postgresql.org. I’m happy to take a look as long as I know them. That’s one of the problems: I have never heard about pg_analytics or pgml, because they’re not on PGXN. Or maybe they are. This is something that we should improve in the next few months.

Jimmy Angelakos in chat: Go is a pretty terrible ecosystem. It has its own package manager in the language, so it’s kinda incompatible with distro packages

Jason Petersen in chat: (but that means a build is safe within a single release, it doesn’t mean you can move a built extension from one Fedora version to another, right?)

David Wheeler (he/him) in chat, replying to “How to handle system…”: Libc is stable in each major version of the OS, and there are separate RPMs for each.

Another question from Steven: how to handle system dependencies like libc version updates. The answer is no. It’s mostly because they don’t update the libc major version across the across across the lifetime of the of the release. So we don’t need to rebuild the extension against libc.

Steven Miller in chat, replying to “How to handle system…”: Ok I see, TY

Jason Petersen in chat, replying to “How to handle system…”: Is that how we deploy in trunk, though?

David Wheeler (he/him) in chat, replying to “Go is a pretty terri…”: Huh? You can build binaries in a sandbox and then you just need the binary in your package.

[Addressing Nevzat’s question]: That’s a great question. It’s up to you! It’s no different than installing Postges or any other thing. I just build RPMs. If you’re reading the hackers mailing list nowadays, people rely on me an Christoph and others, so that we don’t inject any code into the RPMs or Debian packages. You just need to trust us not to add extra code to the packages. But if there’s a feature problem or any bug then you should complain upstream, not to us. so you should just test.

Jimmy Angelakos in chat, replying to “Go is a pretty terri…”: Find me one person that does this.

Steven Miller in chat, replying to “How to handle system…”: We don’t have OS version as one of the dimensions of version packaging but should

[Addressing vrmiguel’s comment]: Yeah, that could be done but like I don’t like complex things, that’s why I’m an RPM packager.

Jason Petersen in chat, replying to “Go is a pretty terri…”: (doesn’t go statically link binaries, or did they drop that philosophy?)

vrmiguel in chat: I think citus has changed it to citus_columnar

David Wheeler (he/him) in chat, replying to “Go is a pretty terri…”: Hugo: https://github.com/hugomods/docker/tree/main/docker/hugo

David Wheeler (he/him) in chat, replying to “Go is a pretty terri…”: Jason: Static except for libc, yes

Another question from Ruohang: uh how to handle RPM and extension name conflicts. I think Citus came first, so you should complain to Hydra and ask them to change the name. They shouldn’t be identical. We have something similar with Pgpool: they they are conflicting with the PCP Library ,which has been in the Linux for the last 25 years. I think Pgpool has to change their name.

Jeremy S in chat, replying to “I still update the c…”: So you think people will run the “alter extension upgrade” eh?

[Addressing Keith Fiske’s comment]: I’m not saying I don’t agree with you, but it means every time I have to update my extension version in my running database — it’s some extra work but that’s okay. It’s the user problem, not my problem.

Question from Jason [on moving an extension from one Fedora to another]: Right, it may not be safe because the GCC version may be different and other stuff may be different. One distro to another is not safe, Jason; sorry about that.

[Back to Steven’s question]: Yes, David’s answer is right.

[Addressing vrmiguel’s comment about citus_columnar]: You are right.

Jimmy I’m not going to read your comment about go because I don’t think think you can swear enough here.

vrmiguel in chat, replying to “Go is a pretty terri…”: Are there known Postgres extensions written in Go? Not sure how Go is relevant here

Jason Petersen in chat: you said “gcc” and not “libc” there, are you implying that things like compiler versions and flags need to be identical between postgres and extensions

Keith Fiske in chat, replying to “I still update the c…”: I think they should …

David Wheeler (he/him) in chat, replying to “Go is a pretty terri…”: Were some experiments a few years ago. https://pkg.go.dev/github.com/microo8/plgo

Let me continue now. First you have to know the extension exists, and then the you also need to know that the extension has an update. Unfortunately the same problem: the extension exists or has an update and they just don’t let us know.

Jimmy Angelakos in chat, replying to “Go is a pretty terri…”: @vrmiguel now you know why :D

This is a big challenge Fedora has in house solution.When you add a new package to Fedora, I think they crawl their repo once a day and if there’s new release they create a ticket in their bug tracker automatically, so that the maintainer knows there’s a new version. This can be done, but would need a volunteer to do it. Orr maybe the easiest thing is just add everything to the to PGXN,

When we update an extension we, have to make sure it doesn’t break anything. It requires some testing. As I said earlier, building is one thing, maintaining the extension is a bigger thing. If you want to raise a baby, you are responsible until until the end of your life. Consider this like your baby: either just let us know if you can’t maintain an extension anymore or please respond to the tickets that I open.

Steven Miller in chat: One other detail about compatibility dimensions. We have noticed some extensions can be complied with chip-specific instructions like AVX512, for example vector does this which optimizes performance in some cases

Alvaro Hernandez in chat, replying to “you said “gcc” and n…”: I’d reverse the question: do we have strong guarantees that there are no risks if versions and/or flags may differ?

I believe extensions are already risky in several ways, and we should diminish any other risks, like packaging ones.

So I’d say absolutely yes, compile extensions and Postgres in exactly the same way, versions and environments.

Sometimes a new minor version of an extension breaks a previous Postgres release. For example, an extension drops support for Postgres 12 even though Postgres 12 is still supported. Or they didn’t do the upgrade path work. I have to make sure everything is safe.

nils in chat, rReplying to “I think citus has ch…”: It was never changed, the extension has always either been embedded in Citus or later moved to a separate extension called citus_columner.

I think the name conflict comes from the access method being called columnar, which Citus claimed first. (Hydra’s started actually as a fork from Citus’ codebase).

(disclaimer; I work on Citus and its ecosystem)

Jason Petersen in chat, replying to “I think citus has ch…”: hi nils

Next month a new beta comes out. Everyone is happy, let’s start testing new features. For the packagers that means it’s time to start building extensions against beta-1. So a build might fail, we fix it, and then it may fail against beta-2. I understand if extension authors may want to wait until rc-1. That’s acceptable as long as they let us know. Many of them fail, and then Christoph and I create tickets against them and display them on wiki.postgresql.org. It’s a Hall of Shame!

Eric in chat: When you decide to package a new extension do you coordinate with upstream to make that decision?

David Wheeler (he/him) in chat, replying to “When you decide to p…”: I learned I had extensions in the yum repo only after the fact

Eric in chat, replying to “When you decide to p…”: I see

vrmiguel in chat: @Devrim Gündüz I’m curious about how RPM deals with extensions that depend on other Pg extensions

David Wheeler (he/him) in chat: You can test Postgres 17 in the pgxn-tools docker image today. Example: https://github.com/theory/pgtap/actions/runs/8502825052

This list pisses off the extension authors because they don’t respond to ticket. So what do we do next? It happens again and again and again, because they just don’t respond to us. On Monday uh I got a response from an extension maintainer. He said “you are talking like you are my boss!” I said, “I’m talking like I’m your user, I’m sorry. I just asked for a very specific thing.”

nils in chat: I love Devrim’s issues against our repo’s! They are like clockwork, every year 😄

David Wheeler (he/him) in chat, replying to “You can test Postgre…”: It relies on the community apt repo

Eric in chat, replying to “When you decide to p…”: Related: ever had upstream request you stop packaging an extension?

Steven Miller* in chat, replying to “One other detail abo…”: Even if compiled inside a container, on a specific chip it can get chip-specific instructions inside the binary. For example building vector on linux/amd64 inside docker on a chip with AVX512, the container image will not work on another linux/amd64 system that does not have AVX512

David Wheeler (he/him) in chat: :boss:

Unresponsive maintainers are a challenge: they don’t respond to tickets, or emails, or they don’t update the extensions for recent Postgres versions.

Don’t get me wrong even the big companies also do this, or they don’t update the extensions for the new GCC versions. I don’t expect them to test everything against all all the GCC versions; that’s that’s my problem. But just respond please.

What’s the responsibility of the packager in this case? Should we fork if they don’t respond at all? No we are not forking it! VBut going to conferences helps, because if the extension author is there I can talk to them in person in a quiet place, in a good way, just “please update the package tomorrow or you’re going to die”. Of course not this but you see what I mean.

[Looking at chat]: I’m going to skip any word about containers; sorry about that.

[Addressing Eric’s question]: That’s a good so so the question! No, actually they support us a lot, because that’s the way that people use their extensions. And do we coordinate with upstream? No, I coordinate with myself and try to build it. Of course upstream just can just create a ticket, send me email, or find me at a conference. They can say, “hey, we have an extension, could you package an RPM?” Sure, why not." I don’t coordinate with Upstream as long as uh there is no problem with the builds.

Eric in chat, replying to “When you decide to p…”: So you haven’t run into a situation where upstream did not want/appreciate you packaging for them?

[Respondinding to nils’s comment]: Thank you, thanks for responding!

[Responding to vrmiguel’s question about depending on other extensions]: We actually add dependency to that one. That’s bit of uh work, like PG rotting depends on PostGIS. In order to provide a seamless installation the PostGIS package, in the PostGIS spec file, I add an extra line that says it provides PostGiS without the version as part of the name. Then when we install pg rotting, it looks for any PostGIS package — which is fine because it can run against any PostGIS version. So I add the dependency to other extensions if we need them.

David G. Johnston in chat: The tooling ideally would report, say to PGXN or whatever the directory location for the initial application is, the issues and remind them that if the build system cannot build their extension it will not be included in the final RPM. You are an unpaid service provider for them and if they don’t meet their obligations their don’t get the benefit of the service.

[Responding to Eric’s upstream follow-up question]: I haven’t seen anything in any upstream where a person didn’t want me to package. But I haven’t seen many appreciations, either; I mean they don’t appreciate you. I’m being appreciated by EDB — money, money, money, must be funny — thanks EDB! But I haven’t had any rejections so far. Good question!

Eric in chat, replying to “When you decide to p…”: Fair. Cool. Thanks

Relying on external repos is a big problem for SUSE. Some of the maintainers just discontinue their repo. One problem with SUSE is they don’t have an EPEL-like repo. EPEL is a great thing. The barrier to add a package to EPEL is not low but not high, either. If you if you’re an advanced packager you can add a package quick enough. Of course it requires review from others. But this a big problem for SUSE.

Lack of maintenance is a problem. We have a repo but they don’t update it; so I have to go find another repo from build.opensuse.org, change it, update the website, change our build instance, etc. That’s a big problem.

David Wheeler (he/him) in chat, replying to “The tooling ideally …”: I want to add build and test success/fail matrices to extension pages on PGXN

Florents Tselai in chat: How do you handle Pl/Python-based extensions + pip dependencies? Especially with virtualenv-based installations. i.e. Pl/Python usually relies on a /usr/bin/python3, but people shouldn’t install dependencies there.

And then there’s costs! What’s the cost of RPMifying an extension? Hosting a build server? We have a very beefy bare metal build server hosted by Enterprise DB, just because I’m working for them and they have a spare machine. Hosting a build server is a cost.

I have to use some external resources for architecture reasons, like some of our build instances, like PPC 64 ,is hosted somewhere else. There are some admin tasks to keep everything and running, like EDB’s IT team actually helped me to fix an issue today in both of our PPC instances.

Jason Petersen in chat, replying to “How do you handle Pl…”: I think early on he said he makes the extensions rely on RPMs that provide those Python dependencies

David Wheeler (he/him) in chat, replying to “How do you handle Pl…”: I have used this pattern for RPMifying Perl tools

Then, maintaining build instances requires keeping them up-to-date, and also that each update doesn’t break anything. It’s not like “dnf update and build a package”. No. It may be a problem with Fedora because Fedora may can update anything any time they want. But it’s a less problem for SUSE and RedHat, but we have to take care that the updates don’t break anything.

Redhat, the company, actually follows our release schedule. We release every three months. Unless something bad happens, we know the next release is in May, on a Thursday. So every Wednesday, one day before our minor release, RedHat releases their new maintenance releases. RedHat is going to release 9.4 on Wednesday before our minor release. What does that mean for us as an RPM packager for RedHat?

*RedHat releases a new version with a new LLVM, for example, and then it means we have to rebuild the packages against the new LLVM so that people can use it. That means I have to work until Thursday morning to build the packages. That’s fine but another problem is for Rocky and Alma Linux users, because they’re are not going to have the updated LLVM package, or any any updated package, like GCC. It’s not like the old RedHat days; they change everything uh in minor versions.

So I have to rebuild GCC and LLVM on our instances, add them to our special repo “sysupdates”, which is in the config file, and this takes many hours because building GCC and LLVM is a big thing.

In the last two years I have not been able to build the from GCC Source RPM. I had to edit everything and not edit the spec files blah blah to be able to build it. I have no idea how how they can break in Source RPM.

So that’s another cost: in May I’m going to spend lots of cycles to keep up with the latest RedHat release, and also make the make the Rocky Linux and Alma Linux users happier. Maintaining build systems is not as easy as running Yup or Zypper update. It requires employing the packager — because I have the bills pay I have the beers to drink.

[Addressing Florents’s PL/Python question]: I don’t know what the PL/Python based extensions are, but I tried to get rid of everything related to pip. I’m not a developer, a DBA isn’t a developer, a Sysadmin isn’t a developer. They’re not suposed to use pip; they are supposed to use the package manager to keep up with everything. My point is if someone needs pip then I should fix it. That’s what I did for Patroni. I added lots of packages to our Git repo just to be able to support Patroni.

Ian Stanton in chat: Need to drop, thank you Devrim!

Jeremy S in chat, replying to “How do you handle Pl…”: A lot of larger companies have inventory management and risk control processes that heavily leverage package management

Alvaro Hernandez in chat: Need to go, ttyl!

vrmiguel in chat, replying to “you said “gcc” and n…”: Do you think there are no guarantees at all? For instance, Postgres loads up the extension with dlopen, which could fail with version mismatch. If that doesn’t occur and the extension loads ‘fine’, how likely do you think an issue could be?

Also I’m curious how often you’ve seen problems arise from libc itself (rather than any of the many things that could cause UB in a C program) and how these problems have manifested

Ahmet Melih Başbuğ in chat: Thank you

Conclusion

I thanked Devrim and all the discussion, and pitched the next mini-summit, where I think Jonathan Katz will talk about the TLE vision and specifics.

Thank you all for coming!

More about… Postgres PGXN Extensions PGConf Summit Yum ZYpp

Patrick Breyer

Piraten-Klage: Landgericht verweigert Entscheidung über Zulässigkeit der freiwilligen Chatkontrolle

Kurze Zeit bevor das Europäische Parlament heute die Genehmigung zur freiwilligen Chatkontrolle durch US-Dienste wie Facebook, Instagram oder GMail um zwei Jahre verlängern will, gibt das Landgericht Kiel einen Rückschlag …

Kurze Zeit bevor das Europäische Parlament heute die Genehmigung zur freiwilligen Chatkontrolle durch US-Dienste wie Facebook, Instagram oder GMail um zwei Jahre verlängern will, gibt das Landgericht Kiel einen Rückschlag der Klage des Piraten-Europaabgeordneten Dr. Patrick Breyer gegen die Maßnahme bekannt: Es traf keine Entscheidung über Zulässigkeit der freiwilligen Chatkontrolle durch Meta, sondern lehnte die Klage in erster Instanz aus formalen Gründen ab. So könne Breyer der Chatkontrolle durch manuelle Aktivierung der Ende-zu-Ende-Verschlüsselung in jedem Chat entgehen. Zum anderen könne Breyer “die Durchsetzung einer politischen Kampagne … nicht im Rahmen eines Zivilprozesses” verfolgen. Breyer kündigt Berufung gegen die Entscheidung an:

“Ich schalte jetzt das Oberlandesgericht ein, damit Meta endlich sichere Verschlüsselung für alle Chats voreinstellt und die eigenmächtige wahllose Chatkontrolle auch durch andere Anbieter als Verstoß gegen das digitale Briefgeheimnis gestoppt wird. Zuckerbergs Meta-Konzern hat im Prozess alles daran gesetzt, zu verhindern, dass über die Zulässigkeit seiner eigenmächtigen, verdachtslosen Chatkontrolle entschieden wird. Das wird ihm auf Dauer nicht gelingen.” Breyer verweist auch auf die anhängige Parallelklage eines Missbrauchsbetroffenen vor dem Landgericht Passau, die von der Gesellschaft für Freiheitsrechte unterstützt wird.

“Es macht mich wütend, dass sich US-Konzerne mithilfe fehleranfälliger Verdächtigungsmaschinen zum Privatkontrolleur unserer persönlichen Nachrichten aufschwingen wollen. Dieser Big Brother-Angriff auf unsere Handys, Privatnachrichten und Fotos mithilfe fehleranfälliger Algorithmen ist, wie wenn die Post alle Briefe öffnen und scannen würde – ineffektiv und illegal. Missbrauchsbetroffene kritisieren schon länger, dass die Chatkontrolle ihnen durch Zerstörung privater Räume bei Hilferufen, Austausch, Therapie und Beratung schadet, statt zu helfen. Unsere ohnehin überlasteten Ermittler mit vielfach falschen Maschinendenunziationen zu überfluten und dadurch Kapazitäten für Ermittlungen gegen die Hintermänner zu stehlen, hat mit Effizienz nichts zu tun. Organisierte ‘Kinderporno-Ringe’ benutzen keine unverschlüsselten Messengerdienste. Folge ist vor allem die massenhafte Kriminalisierung Jugendlicher wegen unbedachter Chatnachrichten oder einvernehmlichem Sexting – so wird das Gegenteil von Kinderschutz erreicht.” Breyer verweist auf die gestern veröffentlichte polizeiliche Kriminalstatistik, derzufolge im vergangenen Jahr fast 40% der Tatverdächtigen wegen “Kinderpornografie” minderjährig waren.

Die EU-Innenminister beraten unterdessen über eine weltweit einzigartige Folgeverordnung (Chatkontrolle 2.0), welche die Chatkontrolle verpflichtend machen und sichere Ende-zu-Ende-Verschlüsselung zerstören soll.

Breyers Infoportal zur Chatkontrolle


Webistemology - John Wunderlich

Eclipse Photos

xkcd nails it again.

xkcd nails it again.


Patrick Breyer

Etappensieg: Internet-Vorratsdatenspeicherung abgesagt – Quick Freeze nachbesserungsbedürftig

SPD, Grüne und FDP haben laut Medienberichten den Weg frei gemacht für das von Justizminister Buschmann vorgeschlagene Quick-Freeze-Verfahren. Der Europaabgeordnete Dr. Patrick Breyer (Piratenartei) kommentiert: „Diese Koalitionsentscheidung ist ein Erfolg …

SPD, Grüne und FDP haben laut Medienberichten den Weg frei gemacht für das von Justizminister Buschmann vorgeschlagene Quick-Freeze-Verfahren.

Der Europaabgeordnete Dr. Patrick Breyer (Piratenartei) kommentiert:

„Diese Koalitionsentscheidung ist ein Erfolg für die Bürgerrechtsbewegung, die seit Jahrzehnten auf der Straße und vor Gericht gegen die Idee einer flächendeckenden Totalerfassung unserer Kontakte, Bewegungen und Internetverbindungen kämpft. Wie im Koalitionsvertrag vereinbart setzt die Ampelkoalition auf eine effektive und anlassbezogene Datenspeicherung (Quick Freeze) und erteilt der extremen SPD- und Unionsforderung nach einer flächendeckenden Internet-Vorratsdatenspeicherung eine klare Absage. Diese wäre ein Dammbruch und ein Frontalangriff auf unser Recht auf anonyme Internetnutzung, auf das unzählige Menschen angewiesen sind.

In einem Punkt ist Buschmanns Quick Freeze-Gesetzentwurf aber massiv nachbesserungsbedürftig: Es fehlt die Vorgabe, dass die betroffenen Personen und Anschlüsse in der Freeze-Anordnung genau bezeichnet werden müssen. Mit dem bisherigen Wortlaut könnten Staatsanwälte und Amtsgerichte flächendeckend die Kommunikationsdaten sämtlicher Bürger auf Vorrat speichern lassen, weil sie für ein Ermittlungsverfahren künftig von Nutzen sein könnten – das muss dringend nachgebessert werden.

Die Koalitionseinigung ist auch nur ein Etappensieg: Die EU plant nach der Europawahl einen neuen Anlauf zur EU-weiten Wiedereinführung einer IP-Vorratsdatenspeicherung, die Vorrang vor deutschem Recht hätte. Das undemokratische Prinzip der anlasslosen Speicherung muss auch von der Tagesordnung der EU-Kommission und der Regierungen anderer Mitgliedsländer verschwinden – die EU-Arbeitsgruppe #EUGoingDark, die für eine EU-weite Wiedereinführung der Vorratsdatenspeicherung sorgen soll, gehört aufgelöst.

Und sollte der Ampel eine Große Koalition folgen, werden die schwarz-roten Grundrechtsterroristen unsere Privatsphäre in der Luft zerreißen. Wir Piraten werden als parlamentarischer Arm der Bürgerrechtsbewegung im Kampf gegen Massenüberwachung nicht locker lassen!“

Die Spitzenkandidatin der Piratenpartei zur EU-Wahl, Anja Hirschel kommentiert:

„Auf anlasslose Vorratsdatenspeicherung zu verzichten ist eine richtige und kluge Entscheidung für die wir lange gekämpft haben! Wir müssen aber auch jetzt genau hinschauen: Quick Freeze ist ein anlassbezogenes und machtvolles Ermittlungsinstrument, dass durch Zugriff auf E-Mail-Adressen, Telefonnummern, IP-Adressen, Standortdaten und weitere hoch-sensible Daten sehr tief in die Grundrechte von Bürgerinnen und Bürgern eingreifen kann. Wir müssen sicherstellen, dass Quick Freeze grundrechtskonform ausgestattet wird und ausschließlich maßvoll, demokratisch und transparent eingesetzt wird. Die Möglichkeiten, QuickFreeze einzusetzen müssen so ausgestaltet und gesichert sein, dass sie nicht von Feinden der Demokratie missbraucht werden können.


Simon Willison

Three major LLM releases in 24 hours (plus weeknotes)

I'm a bit behind on my weeknotes, so there's a lot to cover here. But first... a review of the last 24 hours of Large Language Model news. All times are in US Pacific on April 9th 2024. 11:01am: Google Gemini Pro 1.5 hits general availability, here's the blog post - their 1 million token context GPT-4 class model now has no waitlist, is available to anyone in 180 countries (not including Europ

I'm a bit behind on my weeknotes, so there's a lot to cover here. But first... a review of the last 24 hours of Large Language Model news. All times are in US Pacific on April 9th 2024.

11:01am: Google Gemini Pro 1.5 hits general availability, here's the blog post - their 1 million token context GPT-4 class model now has no waitlist, is available to anyone in 180 countries (not including Europe or the UK as far as I can tell) and most impressively all the API has a free tier that allows up to 50 requests a day, though rate limited to 2 per minute. Beyond that you can pay $7/million input tokens and $21/million output tokens, which is slightly less than GPT-4 Turbo and a little more than Claude 3 Sonnet. Gemini Pro also now support audio inputs and system prompts. 11:44am: OpenAI finally released the non-preview version of GPT-4 Turbo, integrating GPT-4 Vision directly into the model (previously it was separate). Vision mode now supports both functions and JSON output, previously unavailable for image inputs. OpenAI also claim that the new model is "Majorly improved" but no-one knows what they mean by that. 6:20pm (3:20am in their home country of France): Mistral tweet a link to a 281GB magnet BitTorrent of Mixtral 8x22B - their latest openly licensed model release, significantly larger than their previous best open model Mixtral 8x7B. I've not seen anyone get this running yet but it's likely to perform extremely well, given how good the original Mixtral was.

And while it wasn't released today (it came out last week), this morning Cohere's Command R+ (an excellent openly licensed model) reached position 6 on the LMSYS Chatbot Arena Leaderboard - the highest ever ranking for an open weights model.

Since I have a lot of software that builds on these models, I spent a bunch of time today publishing new releases of things.

Datasette Extract with GPT-4 Turbo Vision

I've been working on Datasette Extract for a while now: it's a plugin for Datasette that adds structured data extraction from unstructured text, powered by GPT-4 Turbo.

I updated it for the new model releases this morning, and decided to celebrate by making a video showing what it can do:

I want to start publishing videos like this more often, so this felt like a great opportunity to put that into practice.

The Datasette Cloud blog hasn't had an entry in a while, so I published screenshots and notes there to accompany the video.

Gemini Pro 1.5 system prompts

I really like system prompts - extra prompts you can pass to an LLM that give it instructions about how to process the main input. They're sadly not a guaranteed solution for prompt injection - even with instructions separated from data by a system prompt you can still over-ride them in the main prompt if you try hard enough - but they're still useful for non-adversarial situations.

llm-gemini 0.1a2 adds support for them, so now you can do things like this:

llm -m p15 'say hi three times three different ways' \ --system 'in spanish'

And get back output like this:

¡Hola! 👋 ¡Buenos días! ☀️ ¡Buenas tardes! 😊

Interestingly "in german" doesn't include emoji, but "in spanish" does.

I had to reverse-engineer the REST format for sending a system prompt from the Python library as the REST documentation hasn't been updated yet - notes on that in my issue.

datasette-enrichments-gpt using GPT-4 Turbo

Another small release: the datasette-enrichments-gpt plugin can enrich data in a table by running prompts through GPT-3.5, GPT-4 Turbo or GPT-4 Vision. I released version 0.4 switching to the new GPT-4 Turbo model.

Everything else

That covers today... but my last weeknotes were nearly four weeks ago! Here's everything else, with a few extra annotations:

Blog entries

All five of my most recent posts are about ways that I use LLM tools in my own work - see also my How I use LLMs and ChatGPT series.

Building files-to-prompt entirely using Claude 3 Opus Running OCR against PDFs and images directly in your browser llm cmd undo last git commit - a new plugin for LLM Building and testing C extensions for SQLite with ChatGPT Code Interpreter Claude and ChatGPT for ad-hoc sidequests Releases

Many of these releases relate to ongoing work on Datasette Cloud. In particular there's a flurry of minor releases to add descriptions to the action menu items added by various plugins, best illustrated by this screenshot:

datasette-enrichments-gpt 0.4 - 2024-04-10
Datasette enrichment for analyzing row data using OpenAI's GPT models llm-gemini 0.1a2 - 2024-04-10
LLM plugin to access Google's Gemini family of models datasette-public 0.2.3 - 2024-04-09
Make specific Datasette tables visible to the public datasette-enrichments 0.3.2 - 2024-04-09
Tools for running enrichments against data stored in Datasette datasette-extract 0.1a4 - 2024-04-09
Import unstructured data (text and images) into structured tables datasette-cors 1.0 - 2024-04-08
Datasette plugin for configuring CORS headers asgi-cors 1.0 - 2024-04-08
ASGI middleware for applying CORS headers to an ASGI application files-to-prompt 0.2.1 - 2024-04-08
Concatenate a directory full of files into a single prompt for use with LLMs datasette-embeddings 0.1a3 - 2024-04-08
Store and query embedding vectors in Datasette tables datasette-studio 0.1a3 - 2024-04-06
Datasette pre-configured with useful plugins. Experimental alpha. datasette-paste 0.1a5 - 2024-04-06
Paste data to create tables in Datasette datasette-import 0.1a4 - 2024-04-06
Tools for importing data into Datasette datasette-enrichments-quickjs 0.1a2 - 2024-04-05
Enrich data with a custom JavaScript function s3-credentials 0.16.1 - 2024-04-05
A tool for creating credentials for accessing S3 buckets llm-command-r 0.2 - 2024-04-04
Access the Cohere Command R family of models llm-nomic-api-embed 0.1 - 2024-03-30
Create embeddings for LLM using the Nomic API textract-cli 0.1 - 2024-03-29
CLI for running files through AWS Textract llm-cmd 0.1a0 - 2024-03-26
Use LLM to generate and execute commands in your shell datasette-write 0.3.2 - 2024-03-18
Datasette plugin providing a UI for executing SQL writes against the database TILs impaste: pasting images to piped commands on macOS - 2024-04-04 Installing tools written in Go - 2024-03-26 Google Chrome --headless mode - 2024-03-24 Reviewing your history of public GitHub repositories using ClickHouse - 2024-03-20 Running self-hosted QuickJS in a browser - 2024-03-20 Programmatically comparing Python version strings - 2024-03-17

Gemini 1.5 Pro public preview

Gemini 1.5 Pro public preview Huge release from Google: Gemini 1.5 Pro - the GPT-4 competitive model with the incredible 1 million token context length - is now available without a waitlist in 180+ countries (including the USA but not Europe or the UK as far as I can tell)... and the API is free for 50 requests/day (rate limited to 2/minute). Beyond that you'll need to pay - $7/million input

Gemini 1.5 Pro public preview

Huge release from Google: Gemini 1.5 Pro - the GPT-4 competitive model with the incredible 1 million token context length - is now available without a waitlist in 180+ countries (including the USA but not Europe or the UK as far as I can tell)... and the API is free for 50 requests/day (rate limited to 2/minute).

Beyond that you'll need to pay - $7/million input tokens and $21/million output tokens, which is slightly less than GPT-4 Turbo and a little more than Claude 3 Sonnet.

They also announced audio input (up to 9.5 hours in a single prompt), system instruction support and a new JSON mod.

Via @liambolling


Mistral tweet a magnet link for mixtral-8x22b

Mistral tweet a magnet link for mixtral-8x22b Another open model release from Mistral using their now standard operating procedure of tweeting out a raw torrent link. This one is an 8x22B Mixture of Experts model. Their previous most powerful openly licensed release was Mixtral 8x7B, so this one is a whole lot bigger (a 281GB download) - and apparently has a 65,536 context length, at least ac

Mistral tweet a magnet link for mixtral-8x22b

Another open model release from Mistral using their now standard operating procedure of tweeting out a raw torrent link.

This one is an 8x22B Mixture of Experts model. Their previous most powerful openly licensed release was Mixtral 8x7B, so this one is a whole lot bigger (a 281GB download) - and apparently has a 65,536 context length, at least according to initial rumors on Twitter.

Tuesday, 09. April 2024

Ben Werdmüller

Any business that depends on third-party APIs ...

Any business that depends on third-party APIs that it does not control and is locked into using is not a good business.

Any business that depends on third-party APIs that it does not control and is locked into using is not a good business.


Writing a web-first resumé

I hate writing resumés. There’s always been something about the format that never really sat right with me; each entry presents work I’ve done, sometimes representing many multiple years of my life, without explaining the “why” or the through-line of how I got there. It’s always felt to me like they’re missing the fundamentals of the human being behind them — the values and mindsets of the livin

I hate writing resumés. There’s always been something about the format that never really sat right with me; each entry presents work I’ve done, sometimes representing many multiple years of my life, without explaining the “why” or the through-line of how I got there. It’s always felt to me like they’re missing the fundamentals of the human being behind them — the values and mindsets of the living, breathing person you’d actually be hiring.

On top of that, mine was stuck in a document, when all of my work is on the web.

So I set out to rewrite mine, make it web-first, and turn it into something that I feel like actually represents me and my career. You can check it out here: I gave it its own subdomain at resume.werd.io.

The default version now displays my career in chronological order, each life event building on the next, and includes things that, while not technically being work, informed the work I’ve subsequently done. You’ll find stuff like the electronic hypertext magazine I built as a teenager and the popular meme site I built at university, which wouldn’t normally find their way into a standard resumé, but do help define my expertise and how I approach projects.

From that page, you can flip to a more traditional reverse-chronological resumé that omits the non-employment content and removes unnecessary description. Because it’s an edited-down version of my story, rather than a collection of bullet points, it’s probably better than any resumé I’ve had in the past. It’s web-first but there are a few print-only CSS rules in there to make sure it prints or turns into a PDF really nicely. It’s also nicely responsive, so it displays well on both a laptop / desktop and a phone.

Behind the scenes, here’s how it works:

Each element of my experience is a standalone HTML snippet file of the form startyear-endyear.html. Some are annotated to be “narrative-only”, which means that they won’t show up in the traditional view. The engine I wrote for this will also omit any entries marked “traditional-only” from the narrative view, but in practice I didn’t use that. In narrative view, the engine reads the files in alphabetical order. In traditional view, the engine reads them in reverse-alphabetical order. The effect is to order the entries chronologically or reverse-chronologically as described. Educational entries are marked as such. In narrative view they’re placed into the main flow; in traditional view they’re displayed at the end of the document. In narrative view, any HTML element with a class traditional is marked as display:none in order to hide it; in traditional view, any HTML element with a class narrative is hidden. This allows me to mix narrative content in with more traditional resumé content and only display them in one view. Everything is marked up with the h-resume microformat.

The most important thing for me was being able to write my story as a story, and then edit down accordingly. I’m pretty happy with it! Let me know what you think.

 

Syndicated to IndieNews


Jon Udell

Code in Context: How AI Can Help Improve Our Documentation

Here’s the latest installment in the series on working with LLMS: https://thenewstack.io/code-in-context-how-ai-can-help-improve-our-documentation/. Writing documentation from scratch is as uncommon as writing code from scratch. More typically, you’re updating or expanding or refactoring existing docs. My expectation was that an LLM-powered tool primed with both code and documentation could provide

Here’s the latest installment in the series on working with LLMS: https://thenewstack.io/code-in-context-how-ai-can-help-improve-our-documentation/.

Writing documentation from scratch is as uncommon as writing code from scratch. More typically, you’re updating or expanding or refactoring existing docs. My expectation was that an LLM-powered tool primed with both code and documentation could provide a powerful assist, and Unblocked did.

I don’t know how to measure the boost it gave me. But I do know that I’ll never again want to undertake this kind of project without a tool that can help me assemble the necessary context.

The rest of the series:

1 When the rubber duck talks back

2 Radical just-in-time learning

3 Why LLM-assisted table transformation is a big deal

4 Using LLM-Assisted Coding to Write a Custom Template Function

5 Elevating the Conversation with LLM Assistants

6 How Large Language Models Assisted a Website Makeover

7 Should LLMs Write Marketing Copy?

8 Test-Driven Development with LLMs: Never Trust, Always Verify

9 Learning While Coding: How LLMs Teach You Implicitly

10 How LLMs Helped Me Build an ODBC Plugin for Steampipe

11 How to Use LLMs for Dynamic Documentation

12 Let’s talk: conversational software development

13 Using LLMs to Improve SQL Queries

14 Puzzling over the Postgres Query Planner with LLMs

15 7 Guiding Principles for Working with LLMs

16 Learn by Doing: How LLMs Should Reshape Education

17 How to Learn Unfamiliar Software Tools with ChatGPT

18 Using AI to Improve Bad Business Writing

Monday, 08. April 2024

Doc Searls Weblog

Aviation vs. Eclipse

Contrails form behind jet aircraft flying through the stratosphere. Since high-altitude aviation is happening all around the earth more or less constantly, planes are painting the sky everywhere. (Here is one time-lapse. And another. And one of my own.) Many contrails don’t last, of course, but many do, and together they account for much of […]
Contrails in the stratosphere, smearing sideways into broad cloud cover.  This view is toward the place in the sky where a full solar eclipse will happen a few hours later.

Contrails form behind jet aircraft flying through the stratosphere. Since high-altitude aviation is happening all around the earth more or less constantly, planes are painting the sky everywhere. (Here is one time-lapse. And another. And one of my own.)

Many contrails don’t last, of course, but many do, and together they account for much of the cloud cover we see every day. The altocumulus, altostratus, and cirrus clouds that contrails produce are now officially recognized as homogenitus and homomutatus, which are anthropogenic: owing to human activity.

And today, Eclipse Day, Delta is offering to fly you along the path of totality. Others too? I don’t know. I’m taking a few moments to write this before we walk up to our hilltop cemetery to watch the eclipse for over four minutes, thanks to our lucky location near the very center of Totality.

I’m curious to see and hear contrail reports from others now awaiting their few minutes out of the sun.

1:14pm—The moon’s shadow made landfall in Mexico a short time ago. Here in Bloomington, the sky is well-painted by contrails. Mostly it looks like high-altitude haze, but believe me: if it weren’t for commercial aviation, the sky would be solid blue. Because the contrails today are quickly smeared sideways, losing their form but not their color.

5:00pm—Contrails were aplenty, and a spread-out contrail did slide in front of the sun and the moon…

but it was still a spectacular sight:


Patrick Breyer

Beschwerde gegen ZDF-Sendung zu Vorratsdatenspeicherung

Am 12. November 2023 habe ich eine Programmbeschwerde beim Fernsehrat zur ZDF-Sendung »Kampf gegen sexuelle Gewalt gegen Kinder« vom selben Tag eingereicht. Der Beitrag von »Berlin direkt – …

Am 12. November 2023 habe ich eine Programmbeschwerde beim Fernsehrat zur ZDF-Sendung »Kampf gegen sexuelle Gewalt gegen Kinder« vom selben Tag eingereicht. Der Beitrag von »Berlin direkt – dem politischen Magazin aus der Hauptstadt« enthält inhaltliche Fehler und stellt die Themen Vorratsdatenspeicherung, Quick Freeze und Kinderschutz einseitig dar.

Beschwerde unbegründet abgebügelt

ZDF-Intendant Dr. Norbert Himmler hat Mitte Dezember 2023 meine Beschwerde mit einem Schreiben (PDF) abgebügelt (mein Tröt und Tweet dazu). Weil auf wesentliche Punkte nicht eingegangen wurde und die Antwort stellenweise inhaltlich nicht korrekt ist, habe ich im Januar 2024 eine ausführliche Beschwerde an den Beschwerdeausschuss gesendet. Die Vorsitzende des ZDF-Fernsehrates, Marlehn Thieme, teilte mir Mitte März in einem äußerst kurzen Schreiben (PDF) mit, dass der Fernsehrat meine Beschwerde „abschließend als unbegründet zurückgewiesen hat“. Ihr Schreiben geht auf keinen einzigen meiner Kritikpunkte ein, stimmt meiner Beschwerde „in der Sache“ zu und verweist auf die Kürze des Beitrags, als Begründung für dessen unausgewogene Darstellung des Themas, wobei eine „bewusste (!) Einseitigkeit (…) nicht belegbar” sei. Als einziges Ergebnis meiner Beschwerde wurde mir versichert: Eine „inhaltlich fundierte Beschwerde [bleibt] im ZDF nicht ohne Wirkung. Die intensive Diskussion mit den Programmverantwortlichen (…), führt zu einem konstruktiven Umgang mit den Inhalten der Beschwerde und, wo nötig, auch zu Reaktionen in der redaktionellen Arbeit.“

Reaktionen in der redaktionellen Arbeit?

Die Reaktion des ZDF auf meine ausführliche Beschwerde ist nicht angemessen – natürlich müssen auch kurze Beiträge sachlich korrekt und ausgewogen sein, insbesondere wenn es um äußerst kontroverse, aktuelle politische Gesetzesvorhaben geht, wie den Streit um die Vorratsdatenspeicherung. Seit Jahrzehnten wird mit Scheinargumenten und Falschbehauptungen Stimmung gemacht für eine anlasslose Speicherung von Kommunikationsdaten von allen Bürgerinnen und Bürgern. Erinnert sei exemplarisch an Sigmar Gabriel und seine Behauptung, dass Vorratsdatenspeicherung bei den Anschlägen in Oslo bei der schnelleren Aufklärung geholfen hätte oder die Darstellung, dass Vorratsdatenspeicherung notwendig sei, obwohl Studien keine Belege für eine positive Wirkung erkennen. Journalismus sollte hier auf kritische Distanz gehen. Wir Piraten werden die redaktionelle Arbeit zum Thema Vorratsdatenspeicherung weiter kritisch verfolgen!

Meine Programmbeschwerde vom 12.11.2023

Verletzung der Sorgfaltpflicht nach Medienstaatsvertrag durch vermeidbare Verbreitung folgender falscher Informationen. Konkret:

Nicht nur die FDP, sondern auch die Grünen lehnen dem Koalitionsvertrag entsprechend verdachtslose IP-Vorratsdatenspeicherung ab. Hätte eine einfache Anfrage bestätigt, ist auch nachzulesen. Die Missbrauchsbeauftragte ist zwar Mitglied der Grünen, hat aber natürlich in ihrer Funktion gesprochen und nicht für die Partei. NCMEC durchforstet NICHT systematisch Bildmaterial, sie leiten Verdachtsmeldungen der Konzerne als Clearingstelle weiter (“CyberTipline”). Siehe https://www.missingkids.org/theissues/csam#whatncmecisdoingaboutit Verdächtige können (auch laut BKA) NICHT nur anhand IP-Adresse, sondern oft auch über die Telefonnummer oder die Mailadresse ermittelt werden. Siehe https://www.bka.de/SharedDocs/Kurzmeldungen/DE/Kurzmeldungen/230623_Mindestspeicherfristen_IP-Adressen.html
Darüber hinaus wäre eine Ermittlung auch beim nächsten Login möglich (sog. Login-Falle).

Insgesamt ist der Beitrag weit von einer ausgewogenen Darstellung entfernt. Nur ein Kritiker kommt mit drei Sätzen zu Wort, ansonsten ausschließlich Befürworter einer IP-Vorratsdatenspeicherung. Die Diskussionslage wird dadurch vollkommen falsch abgebildet. Bei einer Anhörung im Bundestag etwa haben viele unabhängige Kritiker gesprochen. Durch einseitige Auswahl der Gesprächspartner wird auch der Diskussionsstand einseitig dargestellt und Gegenargumente fehlen fast vollständig.

Meine Ausführliche Beschwerde vom 11.01.2024

Sehr geehrte Frau Thieme,

die Stellungnahme des Intendanten vom 13. Dezember 2023 zu einigen Punkten aus meiner Beschwerde vom 12. November 2023 habe ich dankend erhalten. Ich möchte meiner Beschwerde wegen Verletzung der Sorgfaltpflicht nach Medienstaatsvertrag durch vermeidbare Verbreitung folgender falscher Informationen wie folgt Nachdruck verleihen und fordere eine Behandlung der Beschwerde im Fernsehrat:

Die Stellungnahme geht nicht darauf ein, dass der Beitrag ab Minute 02:00 fälschlich suggeriert, dass innerhalb der Ampelregierung lediglich die FDP gegen eine Vorratsdatenspeicherung sei: „Keine gesetzlich geregelte Speicherpflicht, keine Datenüberwachung, eine zerstrittene Ampel. Die SPD dafür, die FDP strikt dagegen.“ Auch die zur Ampel gehörenden Grünen lehnen die Vorratsdatenspeicherung ab und unterstützen den Quick-Freeze-Vorschlag, was der Beitrag verschweigt. Der Intendant räumt ein, dass die gezeigte Unabhängige Beauftragte für Fragen des sexuellen Kindesmissbrauchs, Kerstin Claus, nicht für die Grünen gesprochen hat. Warum wird dann zu ihren Aussagen immerzu die Parteimitgliedschaft groß eingeblendet? Insofern werden Zuschauende in diesem Punkt in die Irre geführt.

Auf meinen zweiten Kritikpunkt, das US-Kinderschutzzentrum NCMEC durchforste entgegen der Behauptung im Beitrag nicht systematisch problematisches Bildmaterial, behauptet der Intendant nun, die von Providern eingegangenen Verdachtsmeldungen würden von NCMEC systematisch durchforstet. Auch dies ist falsch. Die millionenfachen automatisierten Verdachtsmeldungen von Algorithmen der Industrie werden vom NCMEC nicht systematisch durchforstet, sondern in einem automatisierten Verfahren an Strafverfolgungsbehörden weiter geleitet bzw. diesen zugänglich gemacht. NCMEC agiert in aller Regel bloß als Clearingstelle (“CyberTipline”), siehe missingkids.org („NCMEC staff may review content reported to the CyberTipline and then the reports are made available to law enforcement for their independent review“).

Auch auf den folgenden Kritikpunkt meiner Beschwerde geht der Intendant nicht ein, darum präzisiere ich diesen noch einmal: Bei Minute 1:31 sagt Kriminalhauptkommissar  Thorsten Ivers: „Die einzige Möglichkeit den Täter zu ermitteln ist über die sogenannte IP-Adresse, also eine eindeutige Bezeichnung eines Telekommunikationsanschlusses.“ Das ist falsch, da Verdächtige oder Täter auch über, wie etwa das BKA informiert, Telefonnummern oder E-Mail-Adressen, die beim Anlegen eines Accounts angegeben werden müssen, identifiziert werden können. Eine weitere Möglichkeit ist etwa die Login-Falle. Laut Bundesregierung gingen 2021 78.600 Hinweise von NCMEC ein, davon wurden 62.300 als strafrechtlich relevant eingestuft, aber nur 2.150 Fälle (3%) konnten mangels gespeicherter IP-Adresse oder anderer Anhaltspunkte nicht weiter verfolgt werden (BT-Drs. 20/534, S. 27 f.).Insgesamt wird bei Kinderpornografie im Netz laut Kriminalstatistik eine Aufklärungsquote von über 90% erreicht, weswegen die Worte im Beitrag „Deutschland tappt im Dunkeln“ fernab der Realität sind.

In allen drei am Anfang des Beitrags genannten Fällen (Lügde, Bergisch Gladbach und Münster) waren es nicht IP-Adressen, die zu den Tätern führten, so dass auch hier ein falscher Eindruck erweckt wird. Die falsche Aussage von Herrn Ivers wird weder im Text des Beitrags, noch durch einen anderen O-Ton korrigiert, was zur Folge hat, dass Zuschauende an dieser Stelle falsch informiert werden.

Der Intendant behauptet in seiner Stellungnahme, dass die in meiner Beschwerde erwähnte Anhörung im Bundestag „nicht Gegenstand des Beitrags“ sei. Erwähnt habe ich die Anhörung vom 11. Oktober 2023, um deutlich zu machen, dass, sich, anders als im Beitrag der Eindruck erweckt wird, Expertinnen und Experten uneins bei dem Thema sind. In dem Beitrag wird bei Minute 03:06 als einzige Stellungnahme aus dieser Anhörung die des Deutschen Richterbunds zitiert und auch bildlich gezeigt, so dass aus der Anhörung selektiv, einseitig und unausgewogen zitiert wird. Anderslautende Stellungnahmen etwa vom Digitale Gesellschaft e.V., vom Bundesbeauftragten für den Datenschutz und die Informationsfreiheit, der Gesellschaft für Freiheitsrechte e.V. oder von der Kinderrechtsexpertin Dr. Sabine Witting werden im Beitrag weder erwähnt noch zitiert. An dieser Stelle werden Zuschauende klar einseitig informiert.

In seiner Antwort schreibt der Intendant, dass sich die Autorin auf die Auseinandersetzung über den Gesetzentwurf des Bundesjustizministers Marco Buschmann zur Einführung eines Quick-Freeze-Verfahrens konzentriert habe. Das ist nicht der Fall, da überwiegend die Notwendigkeit einer Vorratsdatenspeicherung behauptet wird, aber gerade nicht ausgewogen über Vor- und Nachteile des Quick-Freeze-Verfahrens informiert wird. Die Kritik an einer verdachtslosen und flächendeckenden IP-Vorratsdatenspeicherung kommt für eine ausgewogene Darstellung der Debatte zu kurz. Der Umstand, dass eine von vier im Beitrag sprechenden Personen eine Vorratsdatenspeicherung ablehnt, schafft hier keine journalistische Ausgewogenheit.

Grund- und Freiheitsrechte werden im Beitrag lediglich an einer Stelle kurz und implizit erwähnt, obwohl sie in der Debatte eine wesentliche Rolle spielen und in der Kurzbeschreibung des Beitrags im Netz explizit genannt werden. Im Unterschied zum Titel des Beitrags ist der Inhalt nicht neutral, sondern leitend. Während der Titel („Kampf gegen sexuelle Gewalt gegen Kinder“) eine ausgeglichene Darstellung von Maßnahmen gegen sexualisierte Gewalt gegen Kinder erwarten lässt, räumt der Beitrag die meiste Zeit Argumenten für Vorratsdatenspeicherung ein und endet mit einem wertenden Urteil bei 03:58 Minute: „Die Politik schaut tatenlos zu, setzt lieber auf Datenschutz statt Kinderschutz“. Durch diesen vermeintlichen Gegensatz wird das Grundrecht auf informationelle Selbstbestimmung diffamiert, obwohl nach § 3 Medienstaatsvertrag auch die Achtung vor der Freiheit anderer zu stärken ist. Zuschauenden wird suggeriert, dass das Problem beim Datenschutz liege, während der aktuelle Stand der Diskussion um wirksame und überwachungsfreie Maßnahmen für besseren Schutz von Kindern und Jugendlichen im Beitrag unerwähnt bleibt. (Hierzu: Der Kinderschutzbund sieht auch ohne Vorratsdatenspeicherung ausreichend Möglichkeiten im Kampf gegen Pädokriminalität, das Positionspapier der SPD-Bundestagsfraktion: Kinder vor sexualisierter Gewalt schützen Kinderschutz geht uns alle an, netzpolitik.org: Sexualisierte Gewalt gegen Kinder: „Der Gesamtüberblick fehlt komplett“, das neu erschienene Buch zum Thema »Direkt vor unseren Augen« oder Digitalcourage: Alles für den Schutz von Kindern) Das Grundrecht auf Anonymität schützt Kinder, Frauen, Stalkingopfer, sexuelle Minderheiten, politische Aktivisten und viele mehr im Netz, schützt vor falschem Verdacht, bewahrt die Informationsfreiheit, ermöglicht anonyme Veröffentlichungen und Leaks. Der Beitrag erklärt leider in keiner Weise, was IP-Adressen sind und warum sie einem Schutz unterliegen. Insgesamt ist der Beitrag weit von einer ausgewogenen und sorgfältigen Darstellung entfernt.


Ben Werdmüller

I tried Tesla FSD

So, let’s get this out of the way first: I drive a Tesla Model 3. Yes, I know. And yes, even though I bought it a few years ago, there was plenty of evidence of the CEO’s bad behavior and of poor conditions in the factories. And I let my desire to drive an electric car override all that, and it was a poor decision, and now I’m stuck with it because it was also a very bad financial decision and

So, let’s get this out of the way first: I drive a Tesla Model 3. Yes, I know. And yes, even though I bought it a few years ago, there was plenty of evidence of the CEO’s bad behavior and of poor conditions in the factories. And I let my desire to drive an electric car override all that, and it was a poor decision, and now I’m stuck with it because it was also a very bad financial decision and it’s depreciated at such a rate that the diff of the value of the car and the remaining value of my car loan are not in a good place. It was silly. Can we just accept that? Okay, thank you. Moving on.

This month Tesla switched Full Self Driving (FSD) on .. as far as I can tell, every single car. It’s normally a five figure software-only upgrade, which you can pay for up-front or at the ongoing subscription price of $199 a month. But usage of the feature has been low, perhaps in part because if you’ve spent $40-90,000 on a car, spending another five figure sum is annoying, and perhaps in part because people generally value their own lives and the lives of the people they love. So now, for this month only, it’s free, and Tesla can juice their numbers.

And, yes, I tried it.

And let me be clear: nope. Nope nope nope nope nope nope nope.

I’ve rarely been so frightened behind the wheel of my own car. As an experience it ranks right up there with driving that one stretch of the 880 where you’re supposed to go at 45mph and everybody drives twice that, the time I was riding in a Lyft and was jackknifed by a tow truck, or that one time I was driving on the 101 North and a tech bro Lamborghini shot out of the Highway 12 offramp and came within two inches of hitting me, spun around in the road, and careened off into the distance.

To be added to those heart-stopping experiences soon, I have to surmise, is being hit by a Tesla in Full Self Driving Mode.

Unlike my Lambo encounter, I was driving at 25mph down my local road to the store. On this 10 minute drive (5 minutes there and back), my car clung surprisingly close to cars parked on the side of the road — but more importantly, I had to intervene once because, on a stretch of road that had been coned off and narrowed into one lane for both directions, it felt like the car was driving directly into an oncoming vehicle. It’s possible (and, in fact, quite likely) that the car wouldn’t have smashed into the oncoming Subaru. But it felt like it was about to, and I had no desire to make it a scientific test. FSD requires you to keep your hands on the wheel — a lean-back experience this is not — so safely intervening was very natural. The car then asks you to leave a recorded message explaining why you intervened, and the struggle is to use words instead of screaming over and over.

What’s particularly surprising to me is that FSD wasn’t just downloaded to my vehicle — it was switched on by default. At no point did I need to agree to the terms and conditions. All I needed to do was enter a destination into the navigation and pull the gear stalk down once (as if I was simply changing into another gear), and off it went. Autopilot, which is what Tesla calls its cruise control feature, is engaged in a very similar way, so I can see a world where a driver might even switch it on unintentionally.

So would I pay the money for it? Fear on this level seems like a pretty poor use of $12,000 or $199 a month. You can rent a horror movie for $2.99, and paying attention to America is free. But clearly I was curious. Genuinely, if the technology improved to the point where I didn’t feel like I was probably going to die, I would happily sit inside a self-driving vehicle. Less so, perhaps, for going to the store, but definitely for road trips. (Can you imagine an autonomous RV? It would cost an arm and a leg, but if I had the money, I would 100% use one to get across country.) Right now, though, I would rather pour vinegar onto my eyeballs. For those readers who don’t need to think hard about dropping $12,000 — I am not one of them — I would encourage you to spend the money on more caviar or whatever. This ain’t it.


Patrick Breyer

Piraten fordern Stopp der Pläne zum EU-Gesundheitsdatenraum

Die federführenden Ausschüsse des Europäischen Parlaments LIBE und ENVI sollen morgen die Verordnung zur Schaffung eines „Europäischen Gesundheitsdatenraums“ (EHDS) absegnen. Danach sollen zu allen Patienten Informationen über jede medizinische Behandlung einschließlich …

Die federführenden Ausschüsse des Europäischen Parlaments LIBE und ENVI sollen morgen die Verordnung zur Schaffung eines „Europäischen Gesundheitsdatenraums“ (EHDS) absegnen. Danach sollen zu allen Patienten Informationen über jede medizinische Behandlung einschließlich Impfstatus, Medikamente und Schwangerschaften, Labor- und Entlassberichte digital gespeichert werden – auch zu den in Deutschland bisher nicht von der elektronischen Patientenakte betroffenen Privatpatienten. Zugang sollen europaweit eine Vielzahl von Stellen erhalten können. Als Schattenberichterstatter und Mitverhandler der Verordnung für den Innenausschuss (LIBE) warnt der Europaabgeordnete der Piratenpartei Dr. Patrick Breyer vor einem Kontrollverlust der Patienten über sensible Gesundheitsdaten und einer Aufgabe des Arztgeheimnisses.

“Informationen über unsere körperliche und geistige Gesundheit sind äußerst sensibel. Wenn wir uns nicht darauf verlassen können, dass diese Informationen von unseren behandelnden Ärzten vertraulich behandelt werden, lassen wir uns vielleicht nicht mehr behandeln und steigt gar das Suizidrisiko. Die EU lässt sensibelste Patientenakten anhäufen, vernetzen und weitergeben, ohne aber die Kontrolle und Selbstbestimmung der Patienten über ihre Daten sicherzustellen. ‚Alles geht, nichts muss‘ ist kein Ansatz, dem Patienten vertrauen können. Ohne Vertrauen kann ein Europäischer Gesundheitsdatenraum nicht funktionieren. Laut Umfragen wollen mehr als 80% der EU-Bürger selbst über die Weitergabe ihrer Patientenakten entscheiden. Mehrheitlich wollen sie um Einwilligung gebeten werden. Der EU-Deal ist weit davon entfernt. Er verrät die Interessen und den Willen der Patienten, um ihre Daten an Big Tech und Pharmariesen zu verkaufen. Der mit dieser Verordnung einhergehenden Entmündigung von Patienten erteilen wir Piraten eine klare Absage.

Ein europaweiter Zwang zur elektronischen Patientenakte konnte unter anderem auf meine Initiative hin verhindert werden. Laut Artikel 8h und Erwägungsgrund 13a der Verordnung ist das deutsche und österreichische Widerspruchsrecht gegen die Einrichtung einer elektronischen Patientenakte gerettet. Im Fall eines Widerspruchs werden die Pflichtinformationen nur beim behandelnden Arzt gespeichert. Ich selbst werde dieser elektronischen Patientenakte widersprechen, um nicht die Kontrolle über meine Gesundsheitsdaten zu verlieren. Wir wissen aber, dass die wenigsten Patienten, die Fremdzugriffe auf ihre Daten ablehnen, tatsächlich das komplizierte Widerspruchsverfahren durchlaufen.

Wer der elektronischen Patientenakte oder ihrer Auswertung nicht insgesamt widerspricht, ermöglicht wohl zwangsweise auch einen grenzüberschreitenden Zugriff darauf durch ausländische Behandler, Forscher und Regierungen. Das von der Bundesregierung geplante Recht speziell grenzüberschreitenden Datenzugriffen widersprechen zu können, ist in der Verordnung nicht rechtssicher vorgesehen. Das widerspricht dem Interesse und Willen der Patienten, von denen laut Meinungsumfrage nur eine Minderheit einen grenzüberschreitenden europaweiten Zugriff auf ihre Patientenakte wünscht. Es wird auch der Sensibilität von Gesundheitsdaten nicht annähernd gerecht, die von Suchtkrankheiten, psychischen Störungen, Schwangerschaftsabbrüchen bis hin zu Geschlechtskrankheiten und Reproduktionsstörungen reichen.

Auch sonst ist die Verordnung im Sinne maximaler Verwertung unserer persönlicher Gesundheitsdaten und nicht im Interesse der Patienten ausgestaltet. Entgegen der ursprünglichen Position des Europäischen Parlaments müssen die sensiblen Gesundheitsdaten beispielsweise nicht in Europa gespeichert werden, so dass auch eine außereuropäische Speicherung etwa in US-Clouddiensten zulässig ist. Das EU-Parlament hat auch seine Forderung nach einer unabhängigen Zertifizierung der Sicherheit europäischer Gesundheitsdatensysteme aufgegeben.

Die EU-Verordnung fordert von Deutschland keinen besseren Schutz von Gesundheitsdaten als von der Bundesregierung beschlossen. Ohne Einwilligung des Patienten können behandelnde Ärzte europaweit künftig dessen komplette Patientenakte einsehen – es sei denn er widerspricht ausdrücklich. Ohne Einwilligung des Patienten erhalten künftig europaweit auch Gesundheitsministerien und Gesundheitsbehörden, Universitäten, zur Forschung, Produktentwicklung und zum Trainieren ‚künstlicher Intelligenz‘ auch Technologieunternehmen und Pharmaindustrie Zugang zu anonymisierten und personenbezogenen identifizierbaren (nur pseudonymisierten) Patientenakten – es sei denn der Patient widerspricht ausdrücklich. Kein Widerspruchsrecht gibt es in Deutschland gegen die Weitergabe medizinischer Registerdatensätze und von Abrechnungsdatensätzen. Um ihre Einwilligung werden Patienten nur vor Zugriffen auf ihre genetischen Daten gefragt, nicht aber etwa bei Informationen über Psycho- und Suchttherapien, Schwangerschaftsabbrüche, Sexual- und Reproduktionsstörungen. Für uns Piraten steht die Kontrolle des Patienten über seine Daten und der Schutz der ärztlichen Schweigepflicht im Mittelpunkt, so dass Zugriffe auf Behandlungsakten durch Dritte nur nach Nachfrage beim Patienten akzeptabel wären.”

Anja Hirschel, medizinische Informatikerin und Spitzenkandidatin der Piratenpartei für die Europawahl 2024, ergänzt: “Eine zentrale Datenspeicherung weckt Begehrlichkeiten in verschiedenste Richtungen. Wir sprechen dabei allerdings nicht nur von Hackerangriffen, sondern von der sogenannten Sekundärnutzung. Diese bezeichnet Zugriffe, die zu Forschungszwecke vollumfänglich gewährt werden sollen. Die Patientendaten sollen dann an Dritte weitergegeben werden. Aus Datenschutzsicht ist bereits das zentrale Ansammeln problematisch, bei Weitergabe wenigstens ein Opt-In Verfahren (aktive Einwilligung) richtig. Dies würde eine gewisse Entscheidungshoheit jedes Menschen über die persönlichen Daten ermöglichen. Wird allerdings nicht einmal ein Opt-Out Verfahren (aktiver Widerspruch) etabliert, so bedeutet dies letztlich die Abschaffung der Vertraulichkeit jeglicher medizinischer Information. Und das obwohl Ärzte in Deutschland gemäß § 203 StGB berufsständisch zurecht der Schweigepflicht unterliegen, wie u.a. auch Rechtsanwälte. Dieser Schutz unserer privatesten Informationen und das Recht auf vertrauliche Versorgung und Beratung stehen jetzt auf dem Spiel.”

Die Bundesregierung unterstützt die EU-Pläne. Kritik äußerten dagegen der europäische Verbraucherverband BEUC und das Datenschutznetzwerk EDRi.


Computerspiele-Sterben: Piraten schalten EU-Kommission ein

Der Europaabgeordnete der Piratenpartei Dr. Patrick Breyer hat die Europäische Kommission um eine Stellungnahme zu der Entscheidung des französischen Computerspieleherstellers Ubisoft gebeten, das beliebte Computerspiels “The Crew 1” ab April 2024 …

Der Europaabgeordnete der Piratenpartei Dr. Patrick Breyer hat die Europäische Kommission um eine Stellungnahme zu der Entscheidung des französischen Computerspieleherstellers Ubisoft gebeten, das beliebte Computerspiels “The Crew 1” ab April 2024 unbenutzbar zu machen. Diese Maßnahme könnte nach Auffassung von Breyer gegen EU-Recht verstoßen. Hintergrund der Anfrage ist eine Initiative rund um den Youtuber Ross Scott gegen die Zerstörung von Videospielen (stopkillinggames.com).

Der Europaabgeordnete Dr. Patrick Breyer erklärt:

“Die Abschaltung von ‘The Crew 1’ durch Ubisoft ist ein alarmierendes Beispiel dafür, wie die Interessen der Spieler ignoriert werden. Es ist inakzeptabel und wahrscheinlich auch illegal, dass Unternehmen beliebte Spiele erst gewinnbringend verkaufen und dann kurzfristig lahmlegen. Wir Piraten fordern eine allgemeine gesetzliche Regelung, dass verwaiste Software nicht einfach verschwinden darf. Die Community sollte die Möglichkeit haben, sie zu übernehmen und weiterzuentwickeln. Software ist mehr als nur ein Produkt im Regal. Gerade Computerspiele sind integraler Bestandteil unserer digitalen Kultur. Auf meine Anfrage muss jetzt die EU-Kommission klären, ob Ubisofts Vorgehen im Widerspruch zum geltenden EU-Verbraucherschutzrecht steht und welche Grenzen den Herstellern von Computerspielen bei der Außerfunktionsetzung zuvor verkaufter Spiele gesetzt sind.”

Die Digitalexpertin und Spitzenkandidatin der Piratenpartei zur Europawahl Anja Hirschel ergänzt:

„Täglich kommen neue Videospiele auf den Markt, da klingt es zunächst banal, wenn eines davon wieder verschwindet. Es geht hierbei jedoch nicht nur um Fans, die nicht von ihrem Lieblingsspiel lassen können. Die Diskussion dreht sich vielmehr darum, was mit Produkten und Werken geschehen soll, wenn der Eigentümer kein Interesse mehr daran hat.

Das Grundprinzip ist nicht neu: Die Schutzfristen im Urheberrecht, die nach dem Tod des Autors ablaufen haben dafür gesorgt, dass u.a. bedeutende Werke der Literatur der Öffentlichkeit weiter zur Verfügung stehen können. Auch wenn längst kein nennenswerter wirtschaftlicher Nutzen mehr aus ihnen zu ziehen ist.

Entsprechend sollen sog. ‚verwaiste Werke‘, und dazu gehört auch Software, weiterhin nutzbar sein. Hierzu sollen Sie frei öffentlich verfügbar werden. Dieser Grundansatz fördert zudem die Nachhaltigkeit, indem z.B. dann auch Steuerungsprogramme für Hardware weiter erhalten und gepflegt werden können, was die Lebensdauer der Geräte erhöht. “

Breyers schriftliche Anfrage an die EU-Kommission vom Freitag im Wortlaut:

„Der französische Computerspielehersteller ubisoft hat bis Dezember 2023 das Spiel The Crew 1 vertrieben. Durch Abschaltung seiner Server soll es aber ab April 2024 nicht mehr spielbar sein.

Verstößt dieses Vorgehen gegen EU-Recht? Welche Grenzen setzt EU-Recht allgemein den Herstellern von Computerspielen bei der Außerfunktionsetzung zuvor verkaufter Computerspiele? Welche Stellen sind für die Durchsetzung der Vorgaben zuständig?“

Unterdessen empfiehlt die Gamer-Initiative betroffenen deutschen Spielern eine Beschwerde bei der Verbraucherzentrale.


Damien Bod

BFF secured ASP.NET Core application using downstream API and an OAuth client credentials JWT

This article shows how to implement a web application using backend for frontend security architecture for authentication and consumes data from a downstream API protected using a JWT access token which can only be accessed using an app-to-app access token. The access token is acquired using the OAuth2 client credentials flow and the API does […]

This article shows how to implement a web application using backend for frontend security architecture for authentication and consumes data from a downstream API protected using a JWT access token which can only be accessed using an app-to-app access token. The access token is acquired using the OAuth2 client credentials flow and the API does not accept user access tokens from the UI application. OpenIddict is used as the OpenID Connect server. The STS provides both the OAuth2 client and the OpenID Connect client as well as the scope definitions.

Code: https://github.com/damienbod/bff-aspnetcore-angular-downstream-api

The BFF web application is implemented using ASP.NET Core and Angular as the UI tech stack. The Angular part of the web application can only use the ASP.NET Core APIs and secure same site cookies are used to protect the access. The whole application is authenticated using an OpenID Connect confidential code flow client (PKCE). If the web application requires data from the downstream API, a second OAuth client credentials flow is used to acquire the access token. The downstream API does not accept the user delegated access tokens from the UI application.

BFF OIDC code flow client

Implementing the OpenID Connect confidential client is really simple in ASP.NET Core. The AddAuthentication method is used with cookies and OpenID Connect. The cookies are used to store the session and the OpenID Connect is used for the challenge. All server rendered applications are setup like this with small changes required for the OIDC challenge. Due to these small differences, the different OIDC implementations provide specific implementations of the client. These are normally focused and optimized for the specific OIDC servers and do not work good with other OIDC servers. Once you use more than one OIDC server or require multiple clients from the same OIDC server, the client wrappers cause problems and you should revert back to the standards.

var stsServer = configuration["OpenIDConnectSettings:Authority"]; services.AddAuthentication(options => { options.DefaultScheme = CookieAuthenticationDefaults.AuthenticationScheme; options.DefaultChallengeScheme = OpenIdConnectDefaults.AuthenticationScheme; }) .AddCookie() .AddOpenIdConnect(options => { configuration.GetSection("OpenIDConnectSettings").Bind(options); options.SignInScheme = CookieAuthenticationDefaults.AuthenticationScheme; options.ResponseType = OpenIdConnectResponseType.Code; options.SaveTokens = true; options.GetClaimsFromUserInfoEndpoint = true; options.TokenValidationParameters = new TokenValidationParameters { NameClaimType = "name" }; });

Yarp Proxy

The Angular UI can only request data from the ASP.NET Core backend using secure http only cookies. The Angular UI is deployed as part of the ASP.NET Core application in production builds. When creating applications, software developers need to use their preferred tools and YARP is used to support this in the development setup. As a further downstream API is used, YARP can also be used to support this. The proxy takes the API request, validates the cookie, uses another access token and forwards the request to the downstream API. YARP has an ITransformProvider interface which is used to implement this. This also means we have two different YARP configuration setups for development and deployments. (test, integration, production).

using System.Net.Http.Headers; using Yarp.ReverseProxy.Transforms; using Yarp.ReverseProxy.Transforms.Builder; namespace BffOpenIddict.Server.ApiClient; public class JwtTransformProvider : ITransformProvider { private readonly ApiTokenCacheClient _apiTokenClient; public JwtTransformProvider(ApiTokenCacheClient apiTokenClient) { _apiTokenClient = apiTokenClient; } public void Apply(TransformBuilderContext context) { if (context.Route.RouteId == "downstreamapiroute") { context.AddRequestTransform(async transformContext => { var access_token = await _apiTokenClient.GetApiToken( "CC", "dataEventRecords", "cc_secret"); transformContext.ProxyRequest.Headers.Authorization = new AuthenticationHeaderValue("Bearer", access_token); }); } } public void ValidateCluster(TransformClusterValidationContext context) { } public void ValidateRoute(TransformRouteValidationContext context) { } }

The AddReverseProxy is used to add the YARP services.

builder.Services.AddReverseProxy() .LoadFromConfig(builder.Configuration.GetSection("ReverseProxy")) .AddTransforms<JwtTransformProvider>();

And the middleware:

app.MapReverseProxy(); API client credentials client

The YARP proxy uses the OAuth client credentials client to get an access token to access the downstream API. The token is stored in a cache and only rotated when it expires or is missing. The app-to-app security has nothing to do with the delegated client from the web application.

using IdentityModel.Client; using Microsoft.Extensions.Caching.Distributed; namespace BffOpenIddict.Server.ApiClient; public class ApiTokenCacheClient { private readonly ILogger<ApiTokenCacheClient> _logger; private readonly HttpClient _httpClient; private static readonly object _lock = new(); private readonly IDistributedCache _cache; private readonly IConfiguration _configuration; private const int cacheExpirationInDays = 1; private class AccessTokenItem { public string AccessToken { get; set; } = string.Empty; public DateTime ExpiresIn { get; set; } } public ApiTokenCacheClient( IHttpClientFactory httpClientFactory, ILoggerFactory loggerFactory, IConfiguration configuration, IDistributedCache cache) { _httpClient = httpClientFactory.CreateClient(); _logger = loggerFactory.CreateLogger<ApiTokenCacheClient>(); _cache = cache; _configuration = configuration; } public async Task<string> GetApiToken(string api_name, string api_scope, string secret) { var accessToken = GetFromCache(api_name); if (accessToken != null) { if (accessToken.ExpiresIn > DateTime.UtcNow) { return accessToken.AccessToken; } else { // remove => NOT Needed for this cache type } } _logger.LogDebug("GetApiToken new from STS for {api_name}", api_name); // add var newAccessToken = await GetApiTokenInternal(api_name, api_scope, secret); AddToCache(api_name, newAccessToken); return newAccessToken.AccessToken; } private async Task<AccessTokenItem> GetApiTokenInternal(string api_name, string api_scope, string secret) { try { var disco = await HttpClientDiscoveryExtensions.GetDiscoveryDocumentAsync( _httpClient, _configuration["OpenIDConnectSettings:Authority"]); if (disco.IsError) { _logger.LogError("disco error Status code: {discoIsError}, Error: {discoError}", disco.IsError, disco.IsError); throw new ApplicationException($"Status code: {disco.IsError}, Error: {disco.Error}"); } var tokenResponse = await HttpClientTokenRequestExtensions.RequestClientCredentialsTokenAsync(_httpClient, new ClientCredentialsTokenRequest { Scope = api_scope, ClientSecret = secret, Address = disco.TokenEndpoint, ClientId = api_name }); if (tokenResponse.IsError || tokenResponse.AccessToken == null) { _logger.LogError("tokenResponse.IsError Status code: {tokenResponseIsError}, Error: {tokenResponseError}", tokenResponse.IsError, tokenResponse.Error); throw new ApplicationException($"Status code: {tokenResponse.IsError}, Error: {tokenResponse.Error}"); } return new AccessTokenItem { ExpiresIn = DateTime.UtcNow.AddSeconds(tokenResponse.ExpiresIn), AccessToken = tokenResponse.AccessToken }; } catch (Exception e) { _logger.LogError("Exception {e}", e); throw new ApplicationException($"Exception {e}"); } } private void AddToCache(string key, AccessTokenItem accessTokenItem) { var options = new DistributedCacheEntryOptions() .SetSlidingExpiration(TimeSpan.FromDays(cacheExpirationInDays)); lock (_lock) { _cache.SetString(key, System.Text.Json.JsonSerializer.Serialize(accessTokenItem), options); } } private AccessTokenItem? GetFromCache(string key) { var item = _cache.GetString(key); if (item != null) { return System.Text.Json.JsonSerializer.Deserialize<AccessTokenItem>(item); } return null; } }

Downstream API

The downstream API is protected using JWT access tokens. This is setup using the AddJwtBearer method. The scope and other claims should also be validated.

services.AddAuthentication() .AddJwtBearer("Bearer", options => { options.Audience = "rs_dataEventRecordsApi"; options.Authority = "https://localhost:44318/"; options.TokenValidationParameters = new TokenValidationParameters { ValidateIssuer = true, ValidateAudience = true, ValidateIssuerSigningKey = true, ValidAudiences = ["rs_dataEventRecordsApi"], ValidIssuers = ["https://localhost:44318/"], }; }); Notes

This setup can be used for all server rendered applications. You should always use an external identity provider in enterprise setups and never roll out your own identity system as this is expensive to maintain and many enterprise environments no longer accept this due to the extra operation costs. Replacing Angular with react, Vue.js, Svelte or Blazor WASM does not require changes to the authentication. The different UI have differences on how the scripts are loaded or used and some require weaker session security setups.

You should also avoid downstream APIs if not required. Modular monoliths have performance advantages.

Links

https://github.com/damienbod/bff-aspnetcore-angular

https://learn.microsoft.com/en-us/aspnet/core/introduction-to-aspnet-core

https://nx.dev/getting-started/intro

https://github.com/isolutionsag/aspnet-react-bff-proxy-example

https://github.com/openiddict

https://github.com/damienbod/bff-auth0-aspnetcore-angular

https://github.com/damienbod/bff-azureadb2c-aspnetcore-angular

https://github.com/damienbod/bff-aspnetcore-vuejs

https://github.com/damienbod/bff-MicrosoftEntraExternalID-aspnetcore-angular

https://microsoft.github.io/reverse-proxy/articles/transforms.html

https://github.com/microsoft/reverse-proxy/tree/main/samples/ReverseProxy.Transforms.Sample

Sunday, 07. April 2024

Ben Werdmüller

Making version noir

This is completely lovely: a responsive, noir-inspired personal homepage in the form of a comic. I'm inspired. #Indieweb [Link]

This is completely lovely: a responsive, noir-inspired personal homepage in the form of a comic. I'm inspired. #Indieweb

[Link]


The Thing That's Coming

An interesting opinion piece about the collapse of the Francis Scott Key Bridge, the aftermath thereof, and how it all relates to the election. "But even if we had performed adequate maintenance, the bridge was probably doomed. Dali is the size and mass of a skyscraper (far larger than container ships used to be permitted to be, but larger ships lower prices in supply chain

An interesting opinion piece about the collapse of the Francis Scott Key Bridge, the aftermath thereof, and how it all relates to the election.

"But even if we had performed adequate maintenance, the bridge was probably doomed. Dali is the size and mass of a skyscraper (far larger than container ships used to be permitted to be, but larger ships lower prices in supply chains, and lower prices in supply chains help profits, and profits are important)."

And:

"DEI is just diversity, equity, and inclusion, by the way. That's all it is. It's become the new word that racists say when they want to say a slur but they realize they're in mixed company. It's a handy watchword for people who have decided that every problem is the result of the proximal existence of Black people and other marginalized people groups, because what they actually intend is to end the existence of such people, as soon as they can, with as much violence as possible."

And:

"Things are already very very bad for a great number of people in this country; institutional supremacy sees to that, and this supremacy is mostly accommodated by power—not only by openly fascist power like the cabal of creepy Christian weirdos who want to control everyone's bodies, but by run-of-the mill power, because run-of-the-mill power is interested in keeping things as they are, and mostly recognizes supremacy as what it is, which is the way things are." #Democracy

[Link]


The US aims to 'crack the code' on scaling up geothermal energy production

This hadn't really been on my radar: "Just one type of next generation geothermal — called superhot rock energy, where deep drilling reaches temperatures 400 degrees Celsius (752 degrees Fahrenheit) or hotter — is abundant enough to theoretically fulfill the world’s power requirements. In fact, just 1 percent of the world’s superhot rock potential could provide 63 terawatts

This hadn't really been on my radar:

"Just one type of next generation geothermal — called superhot rock energy, where deep drilling reaches temperatures 400 degrees Celsius (752 degrees Fahrenheit) or hotter — is abundant enough to theoretically fulfill the world’s power requirements. In fact, just 1 percent of the world’s superhot rock potential could provide 63 terawatts of clean firm power, which would meet global electricity demand nearly eight times over."

What's absolutely fascinating to me is the idea that fracking techniques could be used to unlock geothermal energy. Is that good? Fracking has negative side effects that go beyond the carbon footprint from oil and gas. On the other hand, of course, moving away from fossil fuels is obviously great. #Climate

[Link]


Wrench in the Gears

Arkansas Bound – New York Trip Quick Summary With Photos

Well, our house sold, and it looks like my next journey is about to begin. I decided to leave town and let the realtor do his thing. On April Fool’s Day we had multiple offers. Who knew that vintage charm held such sway in a world where the incessant drum beat of emerging tech tries [...]

Well, our house sold, and it looks like my next journey is about to begin. I decided to leave town and let the realtor do his thing. On April Fool’s Day we had multiple offers. Who knew that vintage charm held such sway in a world where the incessant drum beat of emerging tech tries to drown out everything else? The process was about as quick and painless as putting one’s life on display with money attached can be. Please know how much I appreciate all of you who have been sending good thoughts and prayers my way. It’s not an easy transition, but the long goodbye is coming to a close. I’ll be taking many boxes of books down to Arkansas by U-Haul next week – priorities, right? Even though the house sale went faster than I ever expected, I continued my planned trip through Syracuse, Oneida, Palmyra, Rochester, Burlington, and Montpelier. It was cold and rainy interspersed with sleet and snow, which made the stretch through the Adirondacks even more picturesque. I’m going to imagine it as a prolonged cleanse of my soul in preparation for the road ahead.

I did a brief overview video tonight, because I was afraid of not having time to do a more polished presentation. For the next month my focus has to be on clearing out the house and moving everything into storage. I did forget to mention my first stop at SUNY Binghamton on the ride up, the stomping grounds for Jeffrey Yass where he started his gambling habit. I’m going to drop some photos in this post, so you can see highlights from the trip so far. These aren’t my best hearts, because materials were limited, the weather was challenging, and the wind wanted to blow everything all over the place. Still, my good intentions were there even if my cold fingers and water drops on my glasses made the presentation less than polished. I do plan to stop at Ben and Jerry’s tomorrow – one of the first benefit corporations.

 

 

SUNY Binghamton, where Jeffrey Yass of Susquehanna International Group refined his gambling technique.

Oneida Community Mansion, Bible Communism, free love, divination, and eugenics in the context of corporate morality, with some amazing trees!

 

Cumorah and Joseph Smith’s Sacred Grove in Palmyra, NY –

Strong Memorial Hospital was the first “Flexner” model hospital, underwritten George Eastman of Eastman Kodak. The hospital was used by the University of Rochester for nonconsensual radiation exposure experiments during the Manhattan Project. It also had one of the first integrated psychiatric units. The Terrence Building, now abandoned, was one of the largest state mental hospitals in the United states from the 1950s through the 1990s, and had over 1,000 beds.

Rochester Institute of Technology’s Magic Spell Lab instructing in immersive video game design, with sculpture installations outside.

Obelisk to the Fox Sisters, founders of the American spiritualist movement. Paid for by Sir Arthur Conan Doyle (of Sherlock Holmes fame) who was a member of the Society for Psychical Research. When he came to Rochester he attended the Plymouth Street Spiritualist Church, which has since been relocated to the neighborhood of the Eastman mansion.

The relocated church.

The remains of the foundation of the cottage where the Fox sisters encountered the spirit of the murdered peddler.

The George Eastman, of Kodak, mansion, garden courtyard. These are wisteria pods.

Rainbow-themed Peace Park along Lake Ontario on land donated by George Eastman.

The very dark and threatening Xerox building in downtown Rochester.

Bausch and Lomb world headquarters located opposite the Xerox building.

Strong Museum of Play with International Video Game Hall of Fame and Digital Worlds “Level-Up” Experience

 

Portal?

Triskelion with water and sphere outside the entrance to the museum.

Time bending reference in the divination toy display.

 

 

MLK Park Opposite Strong Museum in Rochester

 

Derelict Kodak buiding with short dumping in the parking lot – boat and tires.

Seneca Park Zoo that had a partnership with IXO Foundation on conservation impact data. Naked mole rats are mammals that live in a colony like eusocial insects.

On the way to Vermont I drove through Rome, NY (archetypal field?) and spent some time driving around the former Griffiss Air Force Base, which specialized in electronic warfare. It is being redeveloped as a public-private partnership defense-tech mini-city with stack and pack housing, a private international airport, and huge airplane hangers that have been converted to incubator space for quantum computing and nano-tech R&D. Oh, and there’s a sculpture garden with disc golf featuring many mythological characters. I had to leave a few hearts there, too. Note the dramatic chrome Pegasus in the traffic circle. We are living in surreal times folks. Every day is like the Percy Jackson series…

Innovare, supported by the Griffiss Institute represented by an origami griffin.

https://www.innovare.org/ https://www.griffissinstitute.org/

Notice the vintage-vibe Alcoa sticker that was lying in the grass next to the statue. Freya is the Norse goddess of destiny. Her symbol is the distaff.

I made a wish.

 

Farrell Hall at the TRINITY campus of the University of Vermont in Burlington. The building houses the office of Joshua Bongard who assisted in the creation of Xenobots, “living” robots made from frog cells. The building is also home to the Gund Institute for the Environment. You can see the Gund Foundation’s involvement in early childhood impact finance, equity and justice finance, climate solutions, etc. etc. in the organization’s LittleSis entry.

 

 

 

 


Ben Werdmüller

Updates for ShareOpenly

I made some updates to ShareOpenly this weekend: The design now puts the shortlist of social networks you’re most likely to share to at the top. Firefish and Misskey are now supported. There are now instructions for adding “share to” ShareOpenly links to your site. I also added “share this post” links to the byline of every post on my site, including on the homepage and from indices. S

I made some updates to ShareOpenly this weekend:

The design now puts the shortlist of social networks you’re most likely to share to at the top. Firefish and Misskey are now supported. There are now instructions for adding “share to” ShareOpenly links to your site.

I also added “share this post” links to the byline of every post on my site, including on the homepage and from indices.

Support for more social networks, as well as a universal share icon, are in the works.

Saturday, 06. April 2024

Ben Werdmüller

Here's the column Meta doesn't want you to see

"On Thursday I reported that Meta had blocked all links to the Kansas Reflector from approximately 8am to 4pm, citing cybersecurity concerns after the nonprofit published a column critical of Facebook’s climate change ad policy. By late afternoon, all links were once again able to be posted on Facebook, Threads and Instagram–except for the critical column." Here it is. And

"On Thursday I reported that Meta had blocked all links to the Kansas Reflector from approximately 8am to 4pm, citing cybersecurity concerns after the nonprofit published a column critical of Facebook’s climate change ad policy. By late afternoon, all links were once again able to be posted on Facebook, Threads and Instagram–except for the critical column."

Here it is. And if this censorship is taking place, it's quite concerning:

"I had suspected such might be the case, because all the posts I made prior to the attempted boost seemed to drop off the radar with little response. As I took a closer look, I found others complaining about Facebook squelching posts related to climate change." #Media

[Link]

Friday, 05. April 2024

Doc Searls Weblog

Talking Artificial Intelligence with the Real Don Norman

Artificial is AI’s frst name. And Intelligence is a quality, not a quantity. You can’t measure it with a dipstick, a ruler, or an IQ test. If you could, you’d get the same result every time.* But being artificial doesn’t mean AI isn’t dangerous, fun or both. It is, and will be, what we make […]

Artificial is AI’s frst name. And Intelligence is a quality, not a quantity. You can’t measure it with a dipstick, a ruler, or an IQ test. If you could, you’d get the same result every time.*

But being artificial doesn’t mean AI isn’t dangerous, fun or both. It is, and will be, what we make of it.

That’s what Don Norman says, and he’s been publishing in AI journals since 1973. His laboratory produced the first multi-layer neural nets in the 1980s. He wrote Things that Make us Smart in 1993.

In the opinion of myself and countless others, Don is also the foremost authority on design—of anything and everything. For more on that, check out Don’s Web page, his Wikipedia page, and his books. Or, if you just want to sample some of his thoughts on AI, watch this.

Or you can skip all that and come to the good stuff: joining us in a talk with Don in the final salon of this semester on the topic of Artificial +/vs. Human Intelligence. It’s next Tuesday, April 9, at Noon Eastern time. (That’s less than 24 hours after the shadow of the Moon passes over the Indiana University campus. Yes, totality will be local here.)

Also, this won’t be a lecture or a presentation. It will be a lively discussion because Don is especially good at that.

It’s also free and online, but you have to register first. Do that here.

*For what it’s worth, my own known IQ test scores have an 80-point range. I’ve written about that, and the myth of “IQ” here, here, here, here, and I suppose in too many other places.


Fishing For Free TV Signals

I think I will be the last person in Bloomington to try getting free over-the-air TV from what’s left of all the major networks. But that’s just my style, so roll with me while I explain how I’m hoping to do it, with the antenna above, which I’ll need because here is what the Search […]
By expert acclaim, this is the best antenna for receiving hard-to-get over-the-air (OTA) TV signals

I think I will be the last person in Bloomington to try getting free over-the-air TV from what’s left of all the major networks. But that’s just my style, so roll with me while I explain how I’m hoping to do it, with the antenna above, which I’ll need because here is what the Search Map at RabbitEars.info says we might get here:

We live next door right now, and the top station above, WTIU from Indiana University (our PBS affiliate), comes from a tower you can walk to from here. We can get that signal by using a straightened paper clip for an antenna. (You jam the clip into the center hole of the coaxial connector in the back of the TV.) Even a real indoor antenna connected to the same jack gets nothing else, not even the two stations above with “Fair” signal strength.

But this Televes antenna might do the job because we’re on the slope of a hill that faces the Indianapolis stations that carry CBS (WTTV/4 on 27), ABC (WRTV/6 on 25), NBC (WTHR/13 on 13), and Fox (WRDB/41 on 32)*. These range from 27 to 54 miles away, in roughly the same direction. VHF and UHF signals always gain strength when they hit the faces of hills, similar to how surf builds as it approaches a sand bar or a shore. Also, the Televes DAT BOSS antenna gets great reviews:

TechHive: Televes Dat Boss Mix LR review: This is a great outdoor antenna Tyler the Antenna Guy: Televes DATBOSS LR Mix Outdoor Antenna Review 149883 Solid Signal: ALL NEW Televes DATBOSS Mix LR Antenna TESTED (w/assembly instructions) Amazon: Televes DAT Series BOSS Mix LR Outdoor High-VHF/UHF HDTV Antenna (see the reviews)

I was going to put it in our new attic before the drywall goes up. However, the attic space is low and full of close cross-braces. Worse, the antenna is not small and kinda complicated to fit in a space that’s a web of short 2x4s. Dig:

So it will go on a pole in the backyard and feed a coaxial line that will tunnel through conduit under the yard and inside to the new living room.

But I would like to test it first, preferably with a tuner gizmo I can plug into my laptop. I had one of those for years: the Elgato EyeTV Hybrid TV Tuner stick, which looked like a fat thumb drive,with USB-A at one end and a coax connector for an antenna at the other. It was sold in the ’00s and picked up both analog and digital TV (the Digital Transition was happening then), on every North American channel, and came with good software that ran on Macs and operating systems that have long been abandoned. Far as I can tell there are no replacements that run on current hardware or operating system, other than this one sold in Europe. Far as I can tell, it only works on TV bands over there. But I could be wrong. If anybody knows of a gizmo/softward combo I can use, please tell me. My only other option is to buy or find a cheap TV and try that out. Any advice is welcome. Thanks!

*After the digital transition in 2008, and again with the “repack” after 2016, most TV stations moved onto channels other than their original ones, using less spectrum overall. All the TV channels above 36 were auctioned off, first in 2008 and again in 2018. Most buyers were cellular and other short-range wireless carriers, which have been repurposing the old TV spectrum for 5G and other modern uses. The only station in Indianapolis that didn’t move its channel position was WTHR/13. That one is listed in the chart above as one of the “bad” signals for this location. The Televes antenna is designed specifically for “high band” VHF (channels 7-13) and the remaining UHF (14 to 36) TV channels. It also filters out any 5G signals that the antenna might pick up on what used to be the higher UHF channels. By the way, the old “low band” VHF channels (2 to 6) are still in use in some places, but by very few TV stations.  So it’s not worth it for Televes to design an antenna to pick those channels up. Such an antenna would also be a lot bigger and longer because the low-band elements of the antenna would be much longer.

Thursday, 04. April 2024

Ben Werdmüller

Footage and stories from Gaza are heart-wrenching. ...

Footage and stories from Gaza are heart-wrenching. The systematic killing of aid workers is just a small part of the atrocities being committed over there. Hamas is not a force for good in the region but almost all of these people are civilians. There's no way to justify this. There must be a ceasefire. Now.

Footage and stories from Gaza are heart-wrenching. The systematic killing of aid workers is just a small part of the atrocities being committed over there. Hamas is not a force for good in the region but almost all of these people are civilians. There's no way to justify this.

There must be a ceasefire. Now.


Doc Searls Weblog

Feed Time

Two things worth blogging about that happened this morning. One was getting down and dirty trying to make DALL-E 3 work. That turned into giving up trying to find DALL-E (in any version) on the open Web and biting the $20/month bullet for a Pro account with ChatGPT, which for some reason maintains its DALL-E […]
I asked ChatGPT to give me “people eating blogs” and got this after it suggested some details.

Two things worth blogging about that happened this morning.

One was getting down and dirty trying to make DALL-E 3 work. That turned into giving up trying to find DALL-E (in any version) on the open Web and biting the $20/month bullet for a Pro account with ChatGPT, which for some reason maintains its DALL-E 3 Web page while having “Try in ChatGPT︎” on that page link to the ChatGPT home page rather than a DALL-E one. I gather that the free version of DALL-E is now the one you get at Microsoft’s Copilot | Designer, while the direct form of DALL-E is what you get when you prompt ChatGPT (now 4.0 for Pro customers… or so I gather) to give you an image that credits nothing to DALL-E.

The other thing was getting some great help from Dave Winer in putting the new Feedroll category of my Feedland feeds placed on this blog, in a way similar stylistically to old-fashioned blogrolls (such as the one here). You’ll find it in the right column of this blog now. One cool difference from blogrolls is that the feedroll is live. Very cool. I’m gradually expanding it.

Meanwhile, after failing to get ChatGPT or Copilot | Designer to give me the image I needed on another topic (which I’ll visit here later) I prompted them to give me an image that might speak to a feedroll of blogs. ChatGPT gave me the one above, not in response to “people eating blogs” (my first attempt), but instead to “People eating phone, mobile and computer screens of type.” Microsoft | Designer gave me these:

Redraw your own inconclusions.


Ben Werdmüller

I really hope San Francisco stays an ...

I really hope San Francisco stays an idealistic, progressive city and doesn't succumb to centrism. There are plenty of other places for people who want a city run by those values to live. San Francisco is, and has always been, special.

I really hope San Francisco stays an idealistic, progressive city and doesn't succumb to centrism. There are plenty of other places for people who want a city run by those values to live. San Francisco is, and has always been, special.


Doc Searls Weblog

Death is a Feature

When Parisians got tired of cemeteries during the French Revolution, they conscripted priests to relocate bones of more than six million deceased forebears to empty limestone quarries below the city: a hundred miles of rooms and corridors now called The Catacombes. It was from those quarries that much of the city’s famous structures above—Notre Dame, et. al.—were built […]
When Parisians got tired of cemeteries during the French Revolution, they conscripted priests to relocate bones of more than six million deceased forebears to empty limestone quarries below the city: a hundred miles of rooms and corridors now called The Catacombes. It was from those quarries that much of the city’s famous structures above—Notre Dame, et. al.—were built in prior centuries, using a volume of extracted rock rivaling that of Egypt’s Great Pyramids. That rock, like the bones of those who extracted it, was once alive. In the shot above, shadows of future fossils (including moi) shoot the dead with their cell phones.

Elon Musk wants to colonize Mars.

This is a very human thing to want. But before we start following his lead, we might want to ask whether death awaits us there.

Not our deaths. Anything’s. What died there to make life possible for what succeeds it?

From what we can tell so far, the answer is nothing.

To explain why life needs death, answer this: what do plastic, wood, limestone, paint, travertine, marble, asphalt, oil, coal, stalactites, peat, stalagmites, cotton, wool, chert, cement, nearly all food, all gas, and most electric services have in common?

They are all products of death. They are remains of living things or made from them.

Consider this fact: about a quarter of all the world’s sedimentary rock is limestone, dolomite and other carbonate rocks: remains of beings that were once alive. The Dolomites of Italy, the Rock of Gibraltar, the summit of Mt. Everest, all products of death.

Even the iron we mine has a biological source. Here’s how John McPhee explains it in his Pulitzer-winning Annals of the Former World:

Although life had begun in the form of anaerobic bacteria early in the Archean Eon, photosynthetic bacteria did not appear until the middle Archean and were not abundant until the start of the Proterozoic. The bacteria emitted oxygen. The atmosphere changed. The oceans changed. The oceans had been rich in dissolved ferrous iron, in large part put into the seas by extruding lavas of two billion years. Now with the added oxygen the iron became ferric, insoluble, and dense. Precipitating out, it sank to the bottom as ferric sludge, where it joined the lime muds and silica muds and other seafloor sediments to form, worldwide, the banded-iron formations that were destined to become rivets, motorcars and cannons. The is the iron of the Mesabi Range, the Australian iron of the Hammerslee Basin, the iron of Michigan, Wisconsin, Brazil. More than ninety percent of the iron ever mined in the world has come from Precambrian banded-iron formations. Their ages date broadly from twenty-five hundred to two thousand million years before the present. The transition that produced them — from a reducing to an oxidizing atmosphere and the associated radical change in the chemistry of the oceans — would be unique. It would never repeat itself. The earth would not go through that experience twice.

Death produces building and burning materials in an abundance that seems limitless, at least from standpoint of humans in the here and now. But every here and now ends. Realizing that is a vestigial feature of human sensibility.

Take for example, The World Has Plenty of Oil, which appeared in The Wall Street Journal ten years ago. In it, Nansen G. Saleri writes, “As a matter of context, the globe has consumed only one out of a grand total of 12 to 16 trillion barrels underground.” He concludes,

The world is not running out of oil any time soon. A gradual transitioning on the global scale away from a fossil-based energy system may in fact happen during the 21st century. The root causes, however, will most likely have less to do with lack of supplies and far more with superior alternatives. The overused observation that “the Stone Age did not end due to a lack of stones” may in fact find its match.

The solutions to global energy needs require an intelligent integration of environmental, geopolitical and technical perspectives each with its own subsets of complexity. On one of these — the oil supply component — the news is positive. Sufficient liquid crude supplies do exist to sustain production rates at or near 100 million barrels per day almost to the end of this century.

Technology matters. The benefits of scientific advancement observable in the production of better mobile phones, TVs and life-extending pharmaceuticals will not, somehow, bypass the extraction of usable oil resources. To argue otherwise distracts from a focused debate on what the correct energy-policy priorities should be, both for the United States and the world community at large.

In the long view of a planet that can’t replace any of that shit, this is the rationalization of a parasite. That this parasite can move on to consume other irreplaceable substances it calls “resources” does not make its actions any less parasitic.

Or, correctly, saprophytic; since a saprophyte is “an organism which gets its energy from dead and decaying organic matter.”

Moving on to coal, the .8 trillion tons of it in Wyoming’s Powder River Basin now contributes 40% of the fuel used in coal-fired power plants in the U.S. Here’s the biggest coal mine in the basin, called Black Thunder, as it looked to my camera in 2009:

About half the nation’s electricity is produced by coal-fired plants, the largest of which can eat the length of a 1.5-mile long coal train in just 8 hours. In Uncommon Carriers, McPhee says Powder River coal at current rates will last about 200 years.

Then what? Nansen Saleri thinks we’re resourceful enough to get along with other energy sources after we’re done with the irreplaceable kind.

I doubt it.

Wind, tide, and solar are unlikely to fuel aviation, though I suppose fresh biofuel might. Still, at some point, we must take a long view, or join our evolutionary ancestors in the fossil record faster than we might otherwise like.

As I fly in my window seat from place to place, especially on routes that take me over arctic, near-arctic, and formerly arctic locations, I see more and more of what geologists call “the picture”: a four-dimensional portfolio of scenes in current and former worlds. Thus, when I look at the seashores that arc eastward from New York City— Long Island, Block Island, Martha’s Vineyard, Nantucket, Cape Cod—I see a ridge of half-drowned debris scraped off a continent and deposited at the terminus of an ice cap that began melting back toward the North Pole only 18,000 years ago—a few moments before the geologic present. Back then, the Great Lakes were still in the future, their basins covered by ice that did not depart from the lakes’ northern edges until about 7,000 years ago or 5,000 B.C.

Most of Canada was still under ice while civilization began in the Middle East and the first calendars got carved. Fly over Canada often enough and the lakes appear to be exactly what they are: puddles of a recently melted cap of ice. Same goes for most of the ponds around Boston. Every inland swamp in New England and upstate New York was a pond only a few dozen years ago, and was ice only a dozen or so centuries before that. Go forward a few thousand years and all of today’s ponds will be packed with accumulated humus and haired over by woods or farmland. In the present, we are halfway between those two conditions. Here and now, the last ice age is still ending.

As Canada continues to thaw, one can see human activity spark and spread across barren lands, extracting “resources” from ground made free of permafrost only in the last few years. Doing that is both the economic and the pestilential thing to do.

On the economic side, we spend down the planet’s principal, and fail to invest toward interest that pays off for the planet’s species. That the principal we spend has been in the planet’s vaults for millions or billions of years, and in some cases cannot be replaced, is of little concern to those spending it, which is roughly all of us.

Perhaps the planet looks at our species the same way and cares little that every species is a project that ends. Still, in the meantime, from the planet’s own one-eyed perspective, our species takes far more than it gives, and with little regard for consequences. We may know, as Whitman put it, the amplitude of time. We also tend to assume in time’s fullness all will work out.

But it won’t.

Manhattan schist, the bedrock anchoring New York City’s tallest buildings, is a little over half a billion years old. In about the same amount of time, our aging Sun, growing hotter, will turn off photosynthesis. A few billion years later, the Sun will swell into a red giant with a diameter wider than Earth’s orbit, roasting the remains of our sweet blue planet and scattering its material out into the cosmos, perhaps for eventual recycling by stars and planets not yet formed.

In a much shorter run, many catastrophes will happen. One clearly is what our species is already doing to the planet during what geologists correctly call the Anthropocene. I suppose that’s a good reason for Elon and crew to “save” a few members of our vain little species. But why fuck up Mars before we’re done fucking up Earth, when there’s still some leverage with the death we have at home and that Mars won’t begin to have until stuff dies on it?

I’ve always been both an optimist and a realist. Specifically, I’m an optimist for at least the short run, by which I mean the next few dozen years. But I’m a pessimist for our civilization — and our species. Death is always a winning bet.

But hey, maybe nature knows better what to do with us than we do.

The first ancestor of this piece appeared in blogs.harvard.edu on 4 March 2008. The second is here on Medium.


Looking for DALL-E 3 Help

I just returned to DALL-E 3 after using its Microsoft version (currently called Copilot | Designer) for a while. But I can’t get in. See how it says “Try in ChatGPT↗︎?” When I do that, it goes to https://chat.openai.com/. After I log in there, it offers no clue about where DALL-E 3 is. So I […]

I just returned to DALL-E 3 after using its Microsoft version (currently called Copilot | Designer) for a while. But I can’t get in.

See how it says “Try in ChatGPT︎?” When I do that, it goes to https://chat.openai.com/. After I log in there, it offers no clue about where DALL-E 3 is. So I asked, “How do I get to the DALL-E 3 page after I’m logged in here?”

ChatGPT answered, “As of my last update in January 2022, DALL-E 3 is a hypothetical iteration of the DALL-E model, and there’s no information about its existence or how to access it. If it were to be released, you might typically find access through OpenAI’s website or platform if they make it available to the public. However, without further updates or announcements, it’s challenging to provide specific instructions. If OpenAI releases DALL-E 3 or any similar model in the future, you can check their official website or announcements for information on how to access it.”

Additional useless dialog followed. So now I’ll ask the Lazyweb. Does anyone know how to get into DALL-E 3?


Ben Werdmüller

72

It’s my mother’s birthday. She would be 72 today. The week we lost her, I wrote this piece, which I re-read today. In it, our friend Anita Hurrell remembered her like this: One time you drove us in the van to the seaside and we ate sandwiches with cucumber in them and I thought they tasted delicious and I felt this strong sense of deep content sitting with Hannah in the back listening to her

It’s my mother’s birthday. She would be 72 today.

The week we lost her, I wrote this piece, which I re-read today.

In it, our friend Anita Hurrell remembered her like this:

One time you drove us in the van to the seaside and we ate sandwiches with cucumber in them and I thought they tasted delicious and I felt this strong sense of deep content sitting with Hannah in the back listening to her singing and humming for the whole journey. I have no idea where we went, and in my head it was nowhere in England, but rather part of the big-hearted, loving, funny, relaxed, non-conformist world of your family in my childhood - full of your laughter and your enormous kindness.

[…] I look back and see […] what a true feminist you were, how much of parenting you seemed to do much better than we do these days, how generous and homemade and fun and kind the world you and Oscar [my dad] made was.

I feel so privileged to have had that childhood. To have had a mother like her.

In the piece I read at her memorial, I said:

Before I was born, both my parents were involved in struggles to support affirmative action and tenants’ rights. She described herself as having been radicalized early on, but it’s not particularly that she was radical: she could just see past the social templates that everyone is expected to adhere to, and which perpetuate systemic injustices, and could see how everything should operate to be fairer.

That was true on every level. She wanted she and [her siblings] to all be treated equally, and would make it known if she thought the others were getting a raw deal. She tried her best to treat Hannah and I equally. If someone made a sexist or a homophobic remark around her, she would call it out. If someone was xenophobic, or unthinkingly imperialist, she would bring it up. She was outspoken - always with good humor, but always adamant about what really mattered.

When our son was born, I wrote:

The last time I saw you, just over a year ago, you were in a bed in the same institution, your donated lungs breathing fainter and fainter. I kissed you on the forehead and told you I loved you. You’d told me that what you wanted to hear was us talking amongst ourselves; to know that we’d continue without you. In the end, that’s what happened. But I miss you terribly: I feel the grief of losing you every day, and never more than when my child was born.

[…] In this worse universe that doesn’t have you in it, I’ve been intentionally trying to channel you. I’ve been trying to imagine how you would have shown up with them, and what your advice for me would have been. I’ve been trying to convey that good-humored warmth I always felt. You made me feel safe: physically, yes, but more than that, emotionally. I want to make them feel safe, too: to be who they really are.

That first piece from three days after we lost her:

I want to honor her by furthering what she put into the world. The loving, non-conformist, irreverent, equity-minded spirit that she embodied.

Her values are the right ones, I’m sure of it — her non-conformity, her progressivism, her intellectual curiosity, her fearlessness and silliness (and fearless silliness), her gameness for adventure, her internationalism, her inclusive care and love for everybody and absolute disregard for the expectations other people had for her, or for nonsense tradition, for institutions, or for money.

She is the best person I’ve ever met and could ever hope to meet. I’ve been so, deeply sad, every day, but grief isn’t enough to honor her, and I wish I’d been better at doing that.

I miss you, Ma. I love you. I’m so, so sorry.

Wednesday, 03. April 2024

Ben Werdmüller

Why we invented a new metric for measuring readership

"One particular piece of the journalism model that is broken? How news organizations measure their readership." Pageviews are not a million miles away from hits - which is how we measured success in 2003. This is much-needed innovation from The 19th. Alexandra Smith, who wrote this piece and works on audience there, is brilliant and is a voice who should be listened to acr

"One particular piece of the journalism model that is broken? How news organizations measure their readership."

Pageviews are not a million miles away from hits - which is how we measured success in 2003. This is much-needed innovation from The 19th. Alexandra Smith, who wrote this piece and works on audience there, is brilliant and is a voice who should be listened to across journalism and beyond.

The trick isn't convincing a newsroom to consider these ideas. The real trick is to get funders and the broader ecosystem on board. But it's work that must be done. #Media

[Link]


A ProPublica Lawsuit Over Military Court Access Moves Forward

From my colleagues: "ProPublica has “plausibly alleged that the issued guidelines are clearly inconsistent with Congress’ mandate.” This is most apparent, the judge said, in the allegation that the Navy denies the public access to all records in cases that end in acquittals." ProPublica continues to do great work not just in its reporting, but in setting the groundwork for

From my colleagues: "ProPublica has “plausibly alleged that the issued guidelines are clearly inconsistent with Congress’ mandate.” This is most apparent, the judge said, in the allegation that the Navy denies the public access to all records in cases that end in acquittals."

ProPublica continues to do great work not just in its reporting, but in setting the groundwork for open reporting in the public interest. #Democracy

[Link]


Patrick Breyer

Europol-Bericht: Beharren auf Vorratsdatenspeicherung geht an Gefahren vorbei

Europol hat eine Bewertung der organisierten Internet-Kriminalität veröffentlicht (Internet Organised Crime Assessment (IOCTA) (PDF)). Der Bericht beschreibt unter anderem wie sich die Gefahren in diesem Bereich über die letzten zwei …

Europol hat eine Bewertung der organisierten Internet-Kriminalität veröffentlicht (Internet Organised Crime Assessment (IOCTA) (PDF)). Der Bericht beschreibt unter anderem wie sich die Gefahren in diesem Bereich über die letzten zwei Jahre verändert haben.

Dr. Patrick Breyer Europaabgeordneter (Piratenpartei /  Fraktion Grüne/Europäische Freie Allianz) und digitaler Freiheitskämpfer kommentiert:

„Europols Forderung nach anlassloser Vorratsdatenspeicherung entspricht nicht den Fakten. Der Bericht der Agentur beschreibt Gefahren, die mit mit Vorratsdatenspeicherung und jeder anderen Form
von pauschaler Massenüberwachung von Kommunikationsdaten von Bürgerinnen und Bürgern nicht angegangen werden können. Das sehen wir in Mitgliedsstaaten, in denen eine Vorratsdatenspeicherung seit Jahren in Kraft ist. Es ist Zeit, dass sich Europol und die Europäische Union neu ausrichten, indem sie sich auf gezielte Ermittlungen und die Stärkung der Zivilgesellschaft konzentrieren.“

Europols Rolle im “GoingDark”-Programm (#EUGoingDark)

In dem Bericht untersucht die Strafverfolgungsbehörde der Europäischen Union die Themen Cyberkriminalität als Dienstleistung („cybercrime-as-a-service“), Untergrundgemeinschaften („underground communities“), kriminelle Märkte für gestohlene Zugangsdaten, Daten von Betroffenen sowie Betrugsstrategien.

Als Mitglied der Expertengruppe für den Zugang zu Daten für eine wirksame Strafverfolgung, die auch als “Going Dark”-Programm bekannt ist hat Europol die Aufgabe, „zur Integration einer Strafverfolgungsperspektive, einschließlich der Anforderungen an den Schutz der Privatsphäre und den Datenschutz, in alle relevanten politischen Programme und -Maßnahmen der EU beizutragen (‘security by design’)“ (eigene Übersetzungen) und „zu untersuchen, wie ‘security by design’ zu einer Standardanforderung bei der Entwicklung neuer Technologien werden könnte.“ Die dringendsten Herausforderungen seien dabei: „Verschlüsselung (Zugang ‘en clair’ zu gespeicherten Inhalten und digitalen Kommunikationsdaten), Vorratsdatenspeicherung von Lokalisierungs- und Roamingdaten sowie Techniken zur Anonymisierung, einschließlich VPN und Darknet. (Ratsdokument 8281/23 PDF)

Europol ist ein starker Befürworter der Wiedereinführung von Pflichten zur wahllosen Speicherung von Kommunikations-Metadaten der Bürgerinnen und Bürger wie IP-Adressen. Im Jahr 2018 scheiterte die Agentur mit einer „Vorratsdatenspeicherungsmatrix“ – einem Vorschlag zur Wiedereinführung der Vorratsdatenspeicherung in der Europäischen Union. (WK 3005/2018 INIT PDF)

Organisierte Kriminelle können Vorratsdatenspeicherung umgehen – die meisten rechtstreuen Menschen nicht

Im vorgelegten Bericht stellt Europol „ein hohes Maß an Spezialisierung innerhalb krimineller Netzwerke“ fest. Die Agentur sieht sich mit der Entwicklung konfrontiert, dass „Straftäter ihre Handlungen und Identitäten verschleiern, da ihr Wissen über Gegenmaßnahmen zunimmt“.

Dr. Patrick Breyer kommentiert:

„Der Europol-Bericht bestätigt, dass eine pauschale Vorratsdatenspeicherung zur Bekämpfung der organisierten Kriminalität ungeeignet ist, weil sie leicht umgangen werden kann, zum Beispiel durch die Nutzung von Anonymisierungsdiensten. Was wir statt der Massenspeicherung von Kommunikations-Metadaten aller Bürger brauchen, sind schnelle, gut ausgestattete und gezielte Ermittlungen.“

Die Rolle von Internetdienstleistern

Europol beschreibt die Rolle von Internetdienstleistern als Dienstleister für Verbrechen: „Viele Internetdienstleiser (ISPs), die regelmäßig von Kriminiellen genutzt werden, engagieren sich nicht für umfassende Praktiken der Kundenüberwachung wie Know-Your-Customer-Verfahren und Speicherung von Kunden- und Metadaten, wie etwa IP-Adressen.“

Dr. Patrick Breyer kommentiert:

„Europol stellt Internetdienstanbieter unter Generalverdacht und scheint von ihnen zu erwarten, dass sie gegen Datenschutzgesetze verstoßen. Wir haben ein Recht auf anonyme Nutzung des Internets! Die Datenschutzrichtlinien von Internetanbietern ins Visier zu nehmen, ist so, als würde man Vermieter generell für häusliche Gewalt verantwortlich machen. Die Aufgabe und Pflicht von Internetdienstanbietern in Demokratien ist es, Bürgerinnen und Bürgern eine sichere und vertrauliche Kommunikation zu ermöglichen.“

Cyber-Grooming

Der Europol-Bericht erkennt, dass „Täter, die sexualisierte Gewalt gegen Kinder ausüben, soziale Medien ausgiebig nutzen, um mit ihren Opfern in Kontakt zu treten, wobei sie häufig unter falscher Identität mit ihnen interagieren. (…) Täter, die Kinder sexuell ausbeuten, pflegen (to groom) ein Verhältnis zu ihren Opfern, um an sensible Informationen zu gelangen, die dann zu Erpressungszwecken genutzt werden können.“

Dr. Patrick Breyer kommentiert:

„Der Bericht von Europol unterstreicht die Notwendigkeit einer besseren Aufklärung und Schulung von (potenziellen) Opfern, insbesondere von Kindern und Teenagern. Anstelle von Vorratsdatenspeicherung und anderen Mitteln der Massenüberwachung brauchen wir kompetentere und besser ausgestattete Präventionsarbeit, Schulung junger Menschen über Täterstrategien und deren Bekämpfung, anonyme Online-Beratung, Aufklärungsprogramme, datenschutzfreundliche Gestaltung von Social-Media-Plattformen und andere Maßnahmen die das Problem tatsächlich angehen.“

Datendiebstahl verhindern

Zu den wirtschaftlichen Aspekten der Cyberkriminalität stellt Europol fest: „gestohlene Daten sind die zentrale Ware dieser illegalen Ökonomie.“

Dr. Patrick Breyer kommentiert:

„Der Europol-Bericht betont die Bedeutung von Privatsphäre, Anonymität und Verschlüßelung , um Bürgerinnen und Bürger vor Identitätsdiebstahl und anderen Verbrechen zu schützen. Massenhafte Speicherung persönlicher Daten lädt zu Hack-Angriffen und Leaks geradezu ein.“


Ben Werdmüller

My Edinburgh

A friend asked me to recommend some places to visit in Edinburgh — not the big stuff (Arthur’s Seat, for example), but the small haunts and little delights that I used to love. I lived there for a long time, and think back on the city with a lot of fondness. But I’ve been in the US for 13 years now, and a lot will have diverged from the version of the city I have in my head. I thought I’d pub

A friend asked me to recommend some places to visit in Edinburgh — not the big stuff (Arthur’s Seat, for example), but the small haunts and little delights that I used to love. I lived there for a long time, and think back on the city with a lot of fondness. But I’ve been in the US for 13 years now, and a lot will have diverged from the version of the city I have in my head.

I thought I’d publish my list here. I’ve done my best to research whether these places still exist, but I make no guarantees. And, of course, there are new places that are probably even better than the ones I’ve listed here but aren’t even on my radar.

Pubs

You’ve got to start with pubs. They’re not so much bars as community spaces: open living rooms that serve as gathering points. Although, of course, Edinburgh has a drinking culture, and I would never recommend a place that didn’t have a real ale or two.

The Regent was always my favorite pub in Edinburgh: cozy, welcoming, lively. The walk there, either from the Bridges or the Parliament, is atmospheric in its own right, although I always found myself getting a bus or a taxi home at the end of the night.

The Bow Bar takes you down Victoria Street, which is a nice walk in its own right. It’s got a great whisky list, and the whole place feels like a Real Pub, albeit a bit on the fussy side.

The Sheep Heid Inn is a 14th century pub in Duddingston Village. Definitely worth discovering, particularly if you’re wandering around Arthur’s Seat anyway.

I’ll include Doctors not because I particularly like it, but because it’s so convenient that I always bloody ended up there. It’s fine. It’s totally fine. And it’s right by the Meadows and the Museum of Scotland, which are both places you need to obviously visit if you’re in town. So it’s convenient. But I’ve never been excited. And I don’t have a good reason to explain why not.

RIP The Auld Hoose, which was as close to a local as I had. (Update: it’s all a lie! It still exists! Fake news! Weird!)

And an anti-recommendation: pubs on the Royal Mile and Rose St are often recommended but are not my favorite.

Food

A lot of the places I used to go to are gone. I’ve omitted some frequent haunts — there are chippies I love that are objectively terrible, for example. But also, I didn’t eat out all that much when I lived in Edinburgh. The best general advice I can give you is to try haggis if you haven’t, and avoid what the city calls Mexican food if you’ve ever had the actual cuisine.

There was a point where all the staff knew me at Loudons (which now seems to have two locations; mine was in Fountainbridge). It’s a good breakfast spot that, when it opened, was set up for laptop workers, including printing facilities and so on. That initial intention is long gone, but it’s still a lovely place to meet. Or it was, the last time I was there.

David Bann is an upscale vegetarian restaurant that still seems to be going strong. It was the kind of place I’d go to for special occasions.

The Mosque Kitchen is tasty and affordable; a good place to grab a quick lunch. There seem to be two now — one by the mosque and one on Nicholson St — but the former (the one in the mosque) is the original and the one to try.

Khushi’s was the first restaurant I visited in Edinburgh — and, as it turns out, the first Indian restaurant in the city. It’s relaxed and delicious.

I’m convinced that Cappadocia is the best kebab shop in Edinburgh. Still best enjoyed late at night, after you’ve left the pub.

RIP Forest Café, which was the kind of independent arts space you hope exists in every city. Anarchic, inclusive, and beautiful. It, more than anywhere else, represented exactly what I loved about Edinburgh when I loved it, and I’m really sad it’s gone.

RIP also to that one café by the Meadows with the terrible nachos, which was really important to me, and goddamnit, why can’t I remember its name? (Update: it was Favorit. Don’t believe the people talking about the nachos in the linked thread, though.)

Attractions / Etc

This is a grab bag. I used to walk around the city a lot, and I think these choices reflect that.

The National Museum of Scotland — the Museum of Scotland and the Royal Museum, now combined into one mega-museum — is a big thing, so it’s probably on your list already, but it’s free, as all museums everywhere should be, and it has an epic roof terrace view that you should probably check out.

The Scottish Storytelling Centre is one of those places I’m glad exists in the world. The events are worth catching, and the whole place has been put together with real thoughtfulness. But don’t sleep on the café: one of my favorite things to do was grab a cup of tea and a slice of cake here.

Cramond Island (listed at the link as “Cramond Ghost Island” — really?) has great views across the Forth of Firth and is a fun adventure if it’s not freezing cold. There’s a fairly convenient bus.

Dean Village is completely lovely to walk through. I guess the Dean Gallery has been renamed Modern Two, but it was always worth checking out.

The Water of Leith Walkway takes in Dean Village and various works of art, including Anthony Gormley’s series of statues.

The Cameo Cinema is a century old and going strong. I used to buy film festival tickets by the foot here. Even if you don’t want to take in a movie (and if you want to see a movie, it’s the place to do it), it’s got a nice bar. It shows up in The Illusionist, Sylvain Chomet’s homage to both Edinburgh and Jacques Tati, which is in itself worth checking out before you visit.

Edinburgh Inspace is a creative digital hub that showcases events and exhibitions. The space itself is experimental and is wired up for different kinds of multimedia interactivity. It’s worth checking for upcoming exhibitions.

Not mentioned: anything to do with Harry Potter, even that one pachyderm-themed touristy café that’s still pretty lovely (update: no, never mind, it burned down), because, sincerely, fuck JK Rowling.

Tuesday, 02. April 2024

Moxy Tongue

America Works For People, Building For Happiness..

America comes with a proposition ratified (LLPoH); Sovereignty is processed as a runtime event at the edges. Middleization is an attack vector. Inadequate system designs are a feature flaw of inadequate engineering design processes. Governance by administration of law is absolutely evolutionary. Failures happen. Edits happen. Time continues throughout, and personal views are coherent in context. C

America comes with a proposition ratified (LLPoH); Sovereignty is processed as a runtime event at the edges. Middleization is an attack vector. Inadequate system designs are a feature flaw of inadequate engineering design processes. Governance by administration of law is absolutely evolutionary. Failures happen. Edits happen. Time continues throughout, and personal views are coherent in context. Context is an edge driven event requiring insight by Governance under scrutiny. Scrutiny is a cost of Sovereignty. Accurate scrutiny processes are a function of engineering design process. Participation by people in American civil society is an edge-driven experience by people with blood in their veins. Administration of law by Sovereign process requires accurate context for accurate processing. 

Current inadequate understandings and administration by poorly/ wrongly trained managers of Sovereign law is a time-based experience. Yes, lives are affected. "We hold these truths to be self-evident..."; turns out, with digital context, Sovereignty is not so self-evident to some... tcp/ip engineers under administrative influence released a tool of military-use into civil society without understanding scale of impact "breaking silos" would have, and what Constitutes a silo where data is transported meaningfully.
Structure yields results.. welcome to 2024.
Reminder: Communists have no property Rights. 
Sovereignty; recursive accuracy for people.

Ben Werdmüller

Replicators

I’d love to read about the early days of the Star Trek replicator. It’s a sometimes-useful macguffin in the context of Star Trek: The Next Generation and later shows: a device that can recreate virtually any object on command, from food to electronics. By the time ST:TNG was set, it had become a major engine that transformed Star Trek into a post-scarcity, post-money society. But there had to h

I’d love to read about the early days of the Star Trek replicator. It’s a sometimes-useful macguffin in the context of Star Trek: The Next Generation and later shows: a device that can recreate virtually any object on command, from food to electronics.

By the time ST:TNG was set, it had become a major engine that transformed Star Trek into a post-scarcity, post-money society. But there had to have been an earlier, more transitional state, which is more interesting to me: a time when replicators could recreate virtually anything but society hadn’t quite transitioned to post-money rules.

An artist, musician, or artisan during those times might have found that their work could suddenly be replicated infinitely, but they still needed money to survive. How did Federation culture adapt? Were these people taken care of? Or were they seen as necessary collateral? Did they themselves support the idea of a post-scarcity, post-money society (which I agree would be a good thing!) or did they protest?

And if they did protest, how would we feel about that, knowing their immediate context and where the technology led to?

Were there riots? Should there have been?


Damien Bod

Create conditional access base policies for a Microsoft Entra ID tenant

This article shows some of the base conditional access policies which can be implemented for all Microsoft Entra ID tenants. Phishing resistant authentication should be required for all administration flows and some other user policies like sign-in risk MFA or terms of conditions. I recommend these base policies when implementing an Microsoft Entra ID tenant […]

This article shows some of the base conditional access policies which can be implemented for all Microsoft Entra ID tenants. Phishing resistant authentication should be required for all administration flows and some other user policies like sign-in risk MFA or terms of conditions. I recommend these base policies when implementing an Microsoft Entra ID tenant using a P2 license.

Disable security defaults

The security defaults are a good basic setup, but when a P2 license is used, conditional access policies can be applied and the tenant can be setup to force things like phishing resistant authentication.

Disable on the tenant in the “your-tenant” | Overview | Properties

All the security defaults are disabled and good conditional access policies are now required.

Activate conditional access policies

There are many conditional access policies. These are applied and different depending on the tenant requirements. The following base policies make sense in all tenants:

Force MFA conditional access policy (All users) Require Terms of Use policy Block legacy authentication (All users) Enable Sign-in risk policy (All users) Require phishing resistant authentication for admins Enable User risk policy (All users)

A single break glass account is excluded from these policies and this account should never be used except in an emergency. Alerts are required on this account.

1. Force MFA conditional access policy

Multi-factor authentication can be forced for all users except the break glass account. This uses the “Require authentication strength” policy and the tenant can set the default strength as required.

Add the following policy ( Force MFA All users except break glass account )

2. Require Terms of Use policy

Add a Require Terms of Use for app ( App Require Terms of Use ) policy. You can use Microsoft Entra ID to force the users of the tenant and all the client apps to except the terms of conditions required by the tenant and the hosted applications.

The terms of use needs to be added to the Azure tenant:

https://learn.microsoft.com/en-us/entra/identity/conditional-access/terms-of-use

The policy can be created for the terms of use. See the Microsoft docs for details.

https://learn.microsoft.com/en-us/entra/identity/conditional-access/require-tou

3. Block legacy authentication

Block the legacy authentication in the tenant. The Client apps should select only the Exchange ActiveSync clients and Other clients and the access must be blocked.

https://learn.microsoft.com/en-us/entra/identity/conditional-access/howto-conditional-access-policy-block-legacy

4. Enable Sign-in risk policy

You can activate the sign-in risk and choose how strict. If a risky sign-in is detected, the user is required to do a multi-factor authentication. This requires a P2 license for user accounts. See the Microsoft docs for details:

https://learn.microsoft.com/en-us/entra/id-protection/howto-identity-protection-configure-risk-policies

5. Require phishing resistant authentication for admins

Phishing resistant MFA should be applied to app administrator workloads. This can be created from the Azure provided template.

https://learn.microsoft.com/en-us/entra/identity/conditional-access/how-to-policy-phish-resistant-admin-mfa

The policy is applied to the Azure roles:

Global Administrator Application Administrator Authentication Administrator Billing Administrator Cloud Application Administrator Conditional Access Administrator Exchange Administrator Helpdesk Administrator Password Administrator Privileged Authentication Administrator Privileged Role Administrator Security Administrator SharePoint Administrator User Administrator

When a user account has one of the Azure admin roles, phishing resistant authentication is required for access to the tenant.

6. Enable User risk policy (All users)

If a user account has a high or medium level possibility that it has been compromised, the user is required to do a multi-factor authentication. Why not Self-service password reset (SSPR)? I don’t really see the point of this if you are using passwordless sign-ins. Without a SSPR for a user with a password, the user-risk is not reset and the user will be forced to MFA again. I am not sure how this policy works with passwordless or phishing resistant authentication flows. This policy only makes sense with the high threat category and the block user. This requires a P2 license for users accounts.

Summary

These are the base policies and further policies can be added depending on the tenant requirements. Some session based controls would normally make sense as well.

Notes

The examples of the continuous access policies are shown and set up using the Azure portal. This would be way better as a terraform script and a fully automated set up using something like Azure DevOps or Github actions.

Links

https://learn.microsoft.com/en-us/entra/identity/conditional-access/howto-conditional-access-policy-block-legacy

https://learn.microsoft.com/en-us/entra/identity/conditional-access/require-tou

https://learn.microsoft.com/en-us/entra/identity/conditional-access/howto-conditional-access-policy-block-legacy

https://learn.microsoft.com/en-us/entra/id-protection/howto-identity-protection-configure-risk-policies

https://learn.microsoft.com/en-us/entra/identity/conditional-access/how-to-policy-phish-resistant-admin-mfa

https://learn.microsoft.com/en-us/entra/identity/conditional-access/concept-continuous-access-evaluation

Monday, 01. April 2024

Just a Theory

Talk: Postgres Yum Extensions

On Wednesday at noon US Eastern / 16:00 UTC., Devrim Gündüz will be giving a talk: “Overview of the yum.postgresql.org architecture, how new RPMs are added, and issues and challenges with distributing RPMed extensions.”

Quick reminder that Devrim Gündüz of EnterpriseDB will be giving a talk at this week’s extension ecosystem mini-summit, an overview of the PostgreSQL Yum Repository architecture, how new RPMs are added, and issues and challenges with distributing RPMed extensions.

The community Yum and Apt repositories are the canonical distributors of PostgreSQL community builds on Linux platforms, with a long history of robust support for multiple OSes, OS versions, and PostgreSQL versions. Devrim will help us understand how the Yum repository works, the inclusion of extensions from the broader community, and the challenges for automatic binary packaging for all publicly-available extensions.

Join us! Note! that if you reserved a spot at a prior mini-summit, sadly you will need to do so again for each subsequent summit or miss out on reminders from Eventbrite.

Need other information or just want an invitation without using Eventbrite, hit me up at david@ this domain, on Mastodon, or via the #extensions channel on the Postgres Slack.

More about… Postgres Yum RPM Extensions PGConf Summit

Sunday, 31. March 2024

Timothy Ruff

Five Failed Blockchains: Why Trade Needs Protocols, Not Platforms

The trade industry needs a “universal translator” First we.trade failed in mid-2022. Then TradeLens at the end of 2022, then Marco Polo in early 2023, then Contour in late 2023. Of the five major trade-related blockchains only Komgo has survived, but only after ditching blockchain. The Trade Blockchain Graveyard Each failed blockchain had enormous support and resources at the outset: We.tra
The trade industry needs a “universal translator” First we.trade failed in mid-2022. Then TradeLens at the end of 2022, then Marco Polo in early 2023, then Contour in late 2023. Of the five major trade-related blockchains only Komgo has survived, but only after ditching blockchain. The Trade Blockchain Graveyard

Each failed blockchain had enormous support and resources at the outset:

We.trade, launched in 2017 and built in collaboration with IBM on Hyperledger Fabric, boasted Deutsche Bank, HSBC, Santander, Societe Generale and UBS.

TradeLens, launched in 2018 and also built on Fabric, had industry giants IBM and Maersk.

Marco Polo, launched 2019 and built on R3’s Corda, had Commerzbank, BNY Mellon and SMBC with financial backing from ING Ventures and BNP Paribas.

Contour, launched in 2020 and also built on Corda, had ANZ, BNP Paribas, HSBC, and Standard Chartered banks along with a strong contingent of trade integration and documentation partners including Finastra, CargoX, Bolero and Surecomp.

Komgo, launched in 2018 on Quorum blockchain infrastructure and dedicated to trade finance, still boasts Citi, ING, and a dozen others partners and has over 100 customers apparently still using the platform, but has since ditched blockchain.

Despite all the money and power behind them, all major trade blockchains have failed in a surprisingly short period of time. There are several other blockchains dedicated to trade, but these five are the big ones.

Why Business Blockchains Fail

So why all the embarrassing failures? Ledger Insights explains the troubling trend this way:

“In most cases, the issue was a failure to achieve market fit and scale before the money ran out rather than any particular blockchain technology.”

I agree, blockchain’s technology didn’t cause these failures. Blockchains fail because they are platforms — an assertion I defend in detail below — and as we’ve seen in industry after industry, it’s nearly impossible to get entire industries to join the same platform. Blockchains succeed when they out-compete other blockchains for adoption, no different than centralized platforms.

What’s needed to enable secure global digital interoperability for trade are protocols, not platforms, specifically protocols for the secure peer-to-peer exchange of verifiable trade instruments. (<< Read the linked essay, it’s important.) While such protocols aren’t the only thing needed to achieve this elusive objective —standard schema also comes to mind , and is discussed below— protocols are necessary to escape the fatal limitations inherent to blockchains and other platforms.

How Blockchains Are Platforms

Blockchains inherently follow the same “EUM” pattern of the centralized, proprietary platforms they claim to replace, screaming “Everybody Use Me”, or more accurately, “Everybody use my service”. And though permissionless blockchains like Bitcoin and Ethereum may have decentralized governance — which is more than cool — business blockchains have their governance centralized around a single decision-making body, typically a consortium, that makes all the rules.

But regardless of whether a blockchain has decentralized or centralized governance, one common, inescapable fatal flaw remains: a blockchain is a single logical database that all counterparties must agree to use; a singular, rent-seeking source of truth that sits in the middle of all interactions just like proprietary platforms do.

Blockchains are not truly peer-to-peer, despite their claims. The fact that all participants depend upon the same master set of data — along with its bespoke technology, economics, security, and governance — is why there are so many blockchains competing for adoption, and with near-zero interoperability between them.

The problem isn’t with blockchain technology, it’s the blockchain model itself.

Platforms vs. Protocols

All blockchains are platforms, and are not protocols. This is not a subjective distinction. There is a simple way to determine whether something is a platform or a protocol: all platforms have one telltale sign in common: a rent-seeking entity or network that sits in the middle of all interactions. A protocol, such as TCP/IP, HTML, SMTP, 802.11, etc., is a language used entirely peer-to-peer, with no rent-seeking entity or network between peers and no master dataset generated or depended upon.

Being a platform doesn’t guarantee failure, however, as we see with Komgo. The world is full of successful platforms and Komgo is still around because it out-competed other trade blockchains for adoption, but then ditched their blockchain when they realized that blockchain was slowing them down and a traditional centralized database was better. The fact that they could make such a smooth transition away from blockchain illustrates my point: whether using a blockchain or a database under the hood, the platform-style relationship between Komgo and its users was identical, with Komgo sitting in the middle of every interaction and all participating counterparties paying rent. That’s a fine way to make money but an impossible way to make global trade digitally interoperable, because for twenty years we’ve learned and re-learned the same lesson, in industry after industry: it’s nearly impossible to get all necessary counterparties to use the same platform, no matter how powerful its partners are and no matter how much money it has.

Like all industries that struggle with digital interoperability, trade needs protocols, not platforms. For two decades now the trade industry has endeavored to create an electronic bill of lading to replace the paper one, without success. In my view this lack of success has been because competitive platforms lack the incentive to directly interoperate, seeing it as a competitive sacrifice, and they have no protocol available as an indirect alternative. Traditional proprietary platforms started this EUM, winner-takes-all pattern, then the blockchain consortia came along and promised something different: an exciting new way to break through the paper-to-digital logjam. But these trade blockchains acted no differently and did no better than the traditional platforms, ultimately leaving the industry still dependent on paper and paper facsimiles (PDFs).

In contrast, an open, non-proprietary protocol designed for trade could bring a new alternative: a common language that even competitive trade counterparties can speak to each other that brings interoperability without sacrificing competitive standing.

A “Universal Translator” For Trade

This brings us to the image and its caption at the beginning of this article: The trade industry needs a “universal translator”.

In Star Trek, when wearing a universal translator you can speak to any creature in the galaxy using your preferred language and they’ll hear you in theirs, and when they speak their preferred language you hear it in yours. Now imagine if all creatures in the universe were required to use a single, proprietary rent-seeking platform, network, or dataset in the middle to accomplish this, instead of adopting a common protocol/language… it would never work. The only way to solve such a galactic problem: a common, open protocol that’s adopted everywhere that enables devices to use a common, agreed-upon language with each other while speaking bespoke languages to each wearer.

That’s what protocols are: languages. In the case of Star Trek’s universal translator, the protocol is the language spoken between the devices; it must be free, open, and separate from the language spoken between the device and its wearer. Protocols don’t generate shared datasets like blockchains do and they don’t have rent-seeking actors in the middle. Protocols are like SMTP that enabled email senders and recipients to each build or buy their own favored email clients, rather than having to use a common platform like AOL, Prodigy, or CompuServe to exchange messages. Languages can be spoken peer-to-peer between any two parties without a third-party between them, so they are inherently decentralized.

Digital Containers & Payloads

It’s tempting to think of a data schema as a protocol, and in a sense it can be: for two parties to have semantic interoperability — to understand each other and be able to ingest and utilize received data from each other — there must be some degree of agreement about data schema. But the same is true with paper, which depends on common data elements if not exact schema. For digital environments, standardized schema for trade is being worked out by the ICC DSI and others, but… even ubiquitously agreed-up on data schema does not address the need for securely exchanging the schematized data; that part is still missing, it’s like agreeing on the format of physical mail without considering the need for a mail carrier. In the digital realm that ‘carrier’ must have verifiable security from origination to delivery. Without security, common schema is pointless; without common schema, security is useless.

To be crystal clear on this critical point: even when a data schema has been widely agreed to, without a protocol — a common language — for exchanging data securely, there’s no way to verify the source of the data and that it hasn’t been tampered with, revoked, or expired. This security element was supposed to be supplied by blockchains — which are quite secure, though they don’t address critical identity or key management issues — but their platform-like attributes prevented broad adoption for the reasons listed above, leaving trade back where it started: paper and paper-like facsimiles (PDFs).

What’s needed are protocols for secure data exchange that act like digital ships and containers, with schematized data as the payload. Imagine a trading world where all data is received in the form of a standardized schema that arrived in a secure data container that is instantly verifiable in three critical ways:

The identity of the originators of the data, and of all who contributed to or endorsed it; It hasn’t been tampered with, revoked, or expired since origination; It doesn’t require subscribing to any particular blockchain, network, or proprietary platform.

Having secure, globally verifiable data containers with schematized data as the payload is when the global trading game really changes. And it requires protocols, not platforms, to eliminate competitive sacrifices, maximize adoption, and have limitless scale.

In Conclusion

This is something we are working on at Digital Trust Ventures: free, open protocols for trade that work like digital ships and containers and result in a “universal translator”-like effect for counterparties, enabling each to speak and hear their chosen “language”. And like physical containers, these digital containers will have no opinion about the payload inside; the job of these protocols is to ensure that exchanged data is verifiably not tampered with, revoked, or expired, without need of platforms, networks, other entities in the middle.

More to come, later this year. Stay tuned.

Saturday, 30. March 2024

Ben Werdmüller

I think it would be fun to ...

I think it would be fun to (co-)organize an East Coast IndieWebCamp this year, mostly because I would like to go to an East Coast IndieWebCamp this year. Perhaps there's scope for an IndieWebCamp NYC in September / October?

I think it would be fun to (co-)organize an East Coast IndieWebCamp this year, mostly because I would like to go to an East Coast IndieWebCamp this year. Perhaps there's scope for an IndieWebCamp NYC in September / October?


Replay: Memoir of an Uprooted Family, by Jordan Mechner

Just fabulous. Maybe it's because his family history is not a world of different from mine, or maybe it's because his journey and interests feels intertwined with my own, but I wept openly as I read the final third of this. It's beautiful and heartbreaking; true and relevant; deeply resonant in the way Maus was a generation ago. I learned about myself as I read it; I can't reco

Just fabulous. Maybe it's because his family history is not a world of different from mine, or maybe it's because his journey and interests feels intertwined with my own, but I wept openly as I read the final third of this. It's beautiful and heartbreaking; true and relevant; deeply resonant in the way Maus was a generation ago. I learned about myself as I read it; I can't recommend it enough. #Nonfiction

[Link]


Over time I'm becoming more and more ...

Over time I'm becoming more and more enamored with the Derek Sivers mindset to posting on the internet.

Over time I'm becoming more and more enamored with the Derek Sivers mindset to posting on the internet.


Wrench in the Gears

Collective Seekers and Degrees of Freedom

For about a week every spring the sugar ants and I have a stand off. With the house going on the market next week, my heart sank to see a few of them wandering around on the counter. While a nuisance, I can see we are cut from the same cloth – foragers, lone seekers. [...]

For about a week every spring the sugar ants and I have a stand off. With the house going on the market next week, my heart sank to see a few of them wandering around on the counter. While a nuisance, I can see we are cut from the same cloth – foragers, lone seekers. I pulled out the Terro ant bait and laid a couple of packs with gooey gel out on the counter where they would be sure to find it. This season their preferred gateway seems to be around the back of the stove. Most years it’s around the sink.

I finished putting together this video around 2am. The house was chilly. As I trundled in my navy polar fleece robe to put my insulated tea mug in the dishwasher, I saw a thick line of teeny, tiny black bodies huddled around the bait. Stigmeric pheromones were at work. A few scout ants had blazed the trail, and now a miniature highway of busy bodies were doing what they do best. Thanks to Stephers’ work I’ve been talking about pheromones and swarm behavior for several years. Still, it’s fascinating to see it in action. Each ant an agent, participating in a complex dance to achieve a group goal in the most efficient way.

The ants don’t realize in their bustling collectivity, they are laying the groundwork for the end of their colony, one of many hidden civilizations that exist among the Lenten Rose and Hostas and under the brick pavers in the back garden. Their behavior is innate and reinforced by the culture into which they were born. The pest control industry knows these ants well and uses their instinct against them. I keep saying we don’t understand the weapons. Heck, I don’t think we even understand the nature of the engagement.

I am a humanities person who chafes at the imposition of the primacy of STEM that has ramped up over the past few decades. I love stories. I love culture. I love the amazing ways humans mold themselves to their environment and one another. When I was in middle school I had a big map on the wall of my bedroom with marks on the many places I wanted to see around the world. I imagined I was going to be an international correspondent. I chuckle now looking back. I have a few cross-border trips under my belt, and with the way passports are going, I expect to spend this final phase of my life not far from a new home. I’ll trade busy airports for quiet mountain streams and a garden and a pile of books. I’ll travel in my mind, perhaps get acquainted with my astral body equipment… It actually seems like a pretty good outcome all told.

In this video I am struggling with ideas of free will, conformity to groups, civilization, faith, and complex systems. It’s certainly not an original question, how many degrees of freedom do we actually have? Ants have an instinct to seek out sources of food for the colony, and yes there are times when after a very rainy week they will be washed out of their regular stomping grounds and end up in my kitchen. And, I am equipped to use their instinct against them. We see it playing out time and time again on the feed. People seeking camaraderie among the organized bits, but how often are they excitedly bringing back poison?

I spent much of my adult life researching civilization as a historic preservationist, a cultural geographer. Was I studying an insidious domestication of the spirit? Or an intricate unfolding of our divine purpose within a collective framework crystallized over countless soul journeys? I’m going to have to get comfortable with the not knowing; at least until I get to the other side.

 

 

 

 

 

 

Friday, 29. March 2024

Aaron Parecki

OAuth: "grant" vs "flow" vs "grant type"

Is it called an OAuth "grant" or a "flow"? What about "grant type"?
Is it called an OAuth "grant" or a "flow"? What about "grant type"?

These are common questions when writing documentation for OAuth-related things. While these terms are all used in RFC 6749 and many extensions, the differences between the terminology is never actually explained.

I wanted to finally write down a definition of the terms, along with examples of when each is appropriate.

flow - use "flow" when referring to the end-to-end process, for example: "the client initiates the flow by..." "the flow ends with the successful issuance of an access token" This can also be combined with the type of flow, for example: "The Authorization Code flow starts by..." grant - use "grant" when referring to the specific POST request to the token endpoint, for example: "The authorization code grant includes the PKCE code verifier..." "The refresh token grant can be used with or without client authentication..." "Grant" also refers to the abstract concept of the user having granted authorization, which is expressed as the authorization code, or implicitly with the client credentials grant. This is a bit of an academic definition of the term, and is used much less frequently in normal conversation around OAuth. grant type - use "grant type" when referring to the definition of the flow in the spec itself, for example: "there are several drawbacks to the Implicit grant type" "the Authorization Code grant type enables the use of..."

Let me know if you have any suggestions for clarifying any of this, or any other helpful examples to add! I'm planning on adding this summary to OAuth 2.1 so that we have a formal reference for it in the future!

Thursday, 28. March 2024

Just a Theory

PGXN v2: Go or Rust?

What programming language(s) should we use to build new and revamp existing PGXN services and tools: Rust or Go? Vote your preference!

TL;DR: I’d like Postgres community input on a decision: Should we build PGXN v2 services and tools in Go or Rust? Context for the question and some weighing of options constitutes the rest of this post, but to skip to the end, 🗳️ Vote your choice! Poll closes April 12 at the end of the day (midnight) New York time.

The PGXN v2 project now under way requires developing or updating several services and tools, including:

A root registry for source distribution A package registry for binary distribution A command line client for developing, building, and distributing extension packages An interactions service for notifications and stats aggregation

And more. Historically, the PGXN tools were written in Perl, which was an ideal choice for me back in 2011, and quite familiar to many members of the core team, but also deeply foreign to most everyone else. Furthermore, its dynamic nature and need for dozens of dependencies in most apps makes installation and packaging a challenge, to say the least.1

In the last ten years I’ve become quite proficient in Go. I appreciate its near system-level performance, memory safety, robust standard library, concurrency design, and short learning curve — especially for web services. But perhaps most eye-opening to me, as a long-time user of dynamic languages, is that, like C, Go compiles an application into a single static binary. Not only that, but Go provides cross compilation natively. This makes distribution incredibly simple.

Distribution Digression

Compare, for example, the Dockerfile for Sqitch, the database change management system I wrote and maintain in Perl. It’s…a lot. Sure there are a ton of system dependencies, but what’s invisible in this file is the weeks of work that went into Module::Build::Sqitch, which performs a bunch of tricks to build the Sqitch “app” as a single directory with all of its Perl dependencies. Don’t get me wrong, the work was worth it for Sqitch, and powers the Homebrew formula, as well. But even there, I’ve not been able to get Sqitch into the Homebrew core because every single dependency requires a checksum, and I’ve not had the time (or energy) to figure out how to generate them.

Contrast with this Dockerfile for a Go service compiled into a binary named thinko:

FROM gcr.io/distroless/base-debian12:latest # TARGETOS and TARGETARCH: https://docs.docker.com/build/guide/multi-platform/ ARG TARGETOS ARG TARGETARCH COPY "_build/${TARGETOS}-${TARGETARCH}/thinko" /thinko/bin/ USER nonroot:nonroot ENTRYPOINT [ "/thinko/bin/thinko" ]

That’s the whole thing. There are no dependencies at all, aside from a few included in distroless image. And where does that image come from? This is the relevant from the project Makefile:

.PHONY: all # Build all binaries all: local linux darwin windows freebsd linux: thinko-linux darwin: thinko-darwin windows: thinko-windows freebsd: thinko-freebsd thinko-linux: _build/linux-amd64/thinko _build/linux-arm64/thinko thinko-darwin: _build/darwin-amd64/thinko _build/darwin-arm64/thinko thinko-windows: _build/windows-amd64/thinko _build/windows-arm64/thinko thinko-freebsd: _build/freebsd-amd64/thinko _build/freebsd-arm64/thinko # Build Thinko for specific platform _build/%/thinko: cmd/thinko GOOS=$(word 1,$(subst -, ,$*)) GOARCH=$(word 2,$(subst -, ,$*)) $(GO) build -o $@ ./$<

This configuration allows me to build thinko for every OS and architecture at once:

$ make thinko go build -o _build/local/thinko ./cmd/thinko GOOS=linux GOARCH=amd64 go build -o _build/linux-amd64/thinko ./cmd/thinko GOOS=linux GOARCH=arm64 go build -o _build/linux-arm64/thinko ./cmd/thinko GOOS=darwin GOARCH=amd64 go build -o _build/darwin-amd64/thinko ./cmd/thinko GOOS=darwin GOARCH=arm64 go build -o _build/darwin-arm64/thinko ./cmd/thinko GOOS=windows GOARCH=amd64 go build -o _build/windows-amd64/thinko ./cmd/thinko GOOS=windows GOARCH=arm64 go build -o _build/windows-arm64/thinko ./cmd/thinko GOOS=freebsd GOARCH=amd64 go build -o _build/freebsd-amd64/thinko ./cmd/thinko GOOS=freebsd GOARCH=arm64 go build -o _build/freebsd-arm64/thinko ./cmd/thinko

Those first two commands build thinko for Linux on amd64 and arm64, right where the Dockerfile expects them. Building then is easy; a separate make target runs the equivalent of:

$ docker buildx build --platform linux/arm64 -f dist/Dockerfile . $ docker buildx build --platform linux/amd64 -f dist/Dockerfile .

The --platform flag sets the TARGETOS and TARGETARCH arguments in the Dockerfile, and because the directories into which each binary were compiled have these same terms, the binary compiled for the right OS and architecture can be copied right in.

And that’s it, it’s ready to ship! No mucking with dependencies, tweaking system issues, removing unneeded stuff from the image. It’s just the bare minimum.

This pattern works not just for Docker images, of course. See, for example, how [Hugo], the Go blog generator, releases tarballs for a bunch of OSes and architectures, each containing nothing more than a README.md, LICENSE.md, and the hugo binary itself. This pattern allows both the Hugo Homebrew formula and its Dockerfile to be incredibly simple.

Back to PGXN

I very much want these advantages for the next generation of PGXN tools. Not only the services, but also the command-line client, which would become very easy to distribute to a wide variety of platforms with minimal effort.

But there are other variables to weigh in the choice of language for the PGXN servers and tools, including:

Familiarity to other developers: Ideally someone can quickly contribute to a project because they’re familiar with the language, or there’s a short learning curve.

Safety from common issues and vulnerabilities such as buffer overflows, and dangling pointers.

Tooling for robust and integrated development, including dependency management, testing, distribution, and of course cross-compilation.

Decisions, Decisions

In my experience, there are two language that fulfill these requirements very well:

🐿️ Go2 🦀 Rust

Which should we use? Some relevant notes:

I expect to do the bulk of the initial development on PGXN v2, as the only person currently dedicated full time to the project, and I’m most familiar with Go — indeed I enjoy writing web services and CLIs in Go!. I’d therefore be able go ship Go tools more quickly.

But I’ve played around with Rust a number of times over the years, and very much would like to learn more. Its syntax and long feature list steepen the learning curve, but given my background in Perl — another language with unique syntax and context-sensitive features — I’m certain I could become incredibly proficient in Rust after a few months.

My employer, Tembo, is a Rust shop, and we’ll likely borrow heavily from the trunk project, especially for the CLI and binary registry. It would also be easier for my coworkers to contribute.

pgrx, the tooling to build Postgres extensions in Rust, has taken the community by storm, rapidly building familiarity with the language among extensions developers. Perhaps some of those developers would also be willing to turn their expertise to PGXN Rust contributions, as well. It’s likely some features could be borrowed, as well.

Sadly, the plgo project appears to have stalled, so has not built up the same community momentum.

This leaves me torn! But it’s time to start coding, so it’s also time to make some decisions. Should PGXN v2 services and tool be:

🐿️ Written in Go 🦀 Written in Rust 🐿️ + 🦀 Some of each (e.g., Go for web services and Rust for CLIs)

What do you think? If you were to contribute to PGXN, what language would you like to work in? Do you think one language or the other would be more compatible with community direction or core development?3

Got an opinion? 🗳️ Vote! Poll closes April 12 at the end of the day (midnight) New York time.

And if those choices aren’t enough for you, please come yell at me on Mastodon, or via the #extensions channel on the Postgres Slack. Thanks!

Ever wonder why PGXN isn’t hosted by community servers? It’s because I screwed up the installation trying to balance all the dependencies without wiping out Perl modules the systems depend on. 🤦🏻‍♂️ ↩︎

Pity there’s no gopher emoji yet. ↩︎

I can imagine a future where an extension CLI was included in core. ↩︎

More about… Postgres PGXN Go Rust Perl

Werdmüller on Medium

An intrusive thought about Trump 2024

What if the worst happens? Continue reading on Medium »

What if the worst happens?

Continue reading on Medium »


@_Nat Zone

vLEI関連企画である KERI、ACDC、CESRの3つ共に4月20日までのパブリックレビューにかかっています

Trust Over IP Foundation (ToIP) で、KERI, ACDC, CESRの3つの仕様がパブリックレビューにかかっています。GLEIFが主導している vLEI (Verifiable Legal Entity Identifier) 関連の規格です。締め…

Trust Over IP Foundation (ToIP) で、KERI, ACDC, CESRの3つの仕様がパブリックレビューにかかっています。GLEIFが主導している vLEI (Verifiable Legal Entity Identifier) 関連の規格です。締め切りは4月20日です。

AUTHENTIC CHAINED DATA CONTAINERS (ACDC) TASK FORCE ANNOUNCES PUBLIC REVIEW

Key Event Receipt Infrastructure (KERI) specification Authentic Chained Data Containers specification (ACDC) Composable Event Streaming Representation specification (CESR)

ToIP曰く

この仕様一式は、識別子、「クレデンシャル」(脚注参照)、および認証の真に分散化された、真正かつ検証可能なエコシステムを構築するための青写真を提供します。

この仕様には、一連のユニークで革新的な機能が記述されています:

鍵の事前ローテーションによる、真に無制限な期間識別子の実現; 暗号化ルート・オブ・トラスト; 連鎖した「クレデンシャル」[脚注参照]と、完全に検証可能な所有者証明および作者証明; 妥協のないスケーラビリティのためにルックアヘッド・ストリーミングをサポートする独自の特性を備えた、テキストとバイナリの両方の表現に等しく最適化されたシリアライゼーション・フォーマット。

この仕様スイートには、Out-Of-Band Introduction、自己アドレス識別子、および組織アイデンティティの包括的なソリューションを提供するために必要な署名付きコンテナの画期的な「パス署名」アプローチなどのサブ仕様が追加されています。

(出所)ToIP

GLEIFの主導しているvLEIではこの仕様が使われているということで、影響が大きくなることが予想される仕様です 。一時はIETFで規格化するという話でしたが、諸般の事情があり、ToIPで規格化することになったようです。ToIPにとっては初めての規格化になるようです。

残念ながらわたしは読む時間はなさそうですが、暗号関係者の方々にはぜひ読んで頂いてコメントをしていただければと思います。

Wednesday, 27. March 2024

Doc Searls Weblog

Why selling personal data is a bad idea

This post is for the benefit of anyone wondering about, researching, or going into business on the proposition that selling one’s own personal data is a good idea. Here are some of my learnings from having studied this proposition myself for the last twenty years or more. The business does exist. See eleven companies in […]
Prompt: “a field of many different kinds of people being harvested by machines and turned into bales of fertilizer.” Via Microsoft CoPilot | Designer.

This post is for the benefit of anyone wondering about, researching, or going into business on the proposition that selling one’s own personal data is a good idea. Here are some of my learnings from having studied this proposition myself for the last twenty years or more.

The business does exist. See eleven companies in Markets for personal data listed among many other VRM-ish businesses on the ProjectVRM wiki. The business category harvesting the most personal data is adtech (aka ad tech and “programmatic”) advertising, which is the surveillance-based side of the advertising business. It is at the heart of what Shoshana Zuboff calls surveillance capitalism, and is now most of what advertising has become online. It’s roughly a trillion-dollar business. It is also nothing like advertising of the Mad Men kind. (Credit where due: old-fashioned advertising, aimed at whole populations, gave us nearly all the brand names known to the world). As I put it in Separating Advertising’s Wheat and Chaff, Madison Avenue fell asleep, direct response marketing ate its brain, and it woke up as an alien replica of itself. Adtech pays nothing to people for their data or data about them. Not personally. Google may pay carriers for traffic data harvested from phones, and corporate customers of auctioned personal data may pay publishers for moments in which ads can be placed in front of tracked individuals’ ears or eyeballs. Still, none of that money has ever gone to individuals for any reason, including compensation for the insults and inconveniences the system requires. So there is little if any existing infrastructure on which paying people for personal data can be scaffolded up. Nor are there any policy motivations. In fact, Regulations have done nothing to slow down the juggernaut of growth in the adtech industry. For Google, Facebook, and other adtech giants, paying huge fines for violations (of the GDPR, the CCPA, the DMA, or whatever) is just the cost of doing business. The GDPR compliance services business is also in the multi-$billion range, and growing fast. In fact, Regulations have made the experience of using the Web worse for everyone. Thank the GDPR for all the consent notices subtracting value from every website you visit while adding cognitive overhead and other costs to site visitors and operators. In nearly every case, these notices are ways for site operators to obey the letter of the GDPR while violating its spirit. And, although all these agreements are contracts, you have no record of what you’ve agreed to. So they are worse than worthless. Tracking people without their clear and conscious invitation or a court order is wrong on its face. Period. Full stop. That tracking is The Way Things Are Done online does not make it right, any more than driving drunk or smoking in crowded elevators was just fine in the 1950s. When the Digital Age matures, decades from now, we will look back on our current time as one thick with extreme moral compromises that were finally corrected after the downsides became clear and more ethically sound technologies and economies came along. One of those corrections will be increasing personal agency rather than just corporate capacities. In fact, Increasing personal independence and agency will be good for markets, because free customers are more valuable than captive ones. Having ways to gather, keep, and make use of personal data is an essential first step toward that goal. We have made very little progress in that direction so far. (Yes, there are lots of good projects listed here, but there we still a long way to go.) Businesses being “user-centric” will do nothing to increase customers’ value to themselves and the marketplace. First, as long as we remain mere “users” of others’ systems, we will be in a subordinate and dependent role. While there are lots of things we can do in that role, we will be able to do far more if we are free and independent agents. Because of that, We need technologies that create and increase personal independence and agency. Personal data stores (aka warehouses, vaults, clouds, life management platforms, lockers, and pods) are one step toward doing that. Many have been around for a long time: ProjectVRM currently lists thirty-three under the Personal Data Stores heading. Some have been there a long time. The problem with all of them is that they are still too focused on what people do as social beings in the Web 2.0 world, rather than on what they can do for themselves, both to become more well-adjusted human beings and more valuable customers in the marketplace. For that, It will help to have independent personal AIs. These are AI systems that work for us, exclusively. None exist yet. When they do, they  will help us manage the personal data that fully matters: Contacts—records and relationships Calendars—where we’ve been, what we’ve done, with whom, where, and when Health records and relationships with providers, going back all the way Financial records and relationships, including past and present obligations Property we have and where it is, including all the small stuff Shopping—what we’ve bought, plan to buy, or might be thinking about, Subscriptions—what we’re paying for, when they end or renew, what kind of deal we’re locked into, and what better ones might be out there. Travel—Where we’ve been, what we’ve done, with whom, and when

Personal AIs are today where personal computers were fifty years ago. Nearly all the AI news today is about modern mainframe businesses: giants with massive data centers churning away on ingested data of all kinds. But some of these models are open sourced and can be made available to any of us for our own purposes, such as dealing with the abundance of data in our own lives that is mostly out of control. Some of it has never been digitized. With AI help it could be.

I’m in a time crunch right now. So, if you’re with me this far, read We can do better than selling our data, which I wrote in 2018 and remains as valid as ever. Or dig The Intention Economy: When Customers Take Charge (Harvard Business Review Press, 2012), which Tim Berners Lee says inspired Solid. I’m thinking about following it up. If you’re interested in seeing that happen, let me know.


Phil Windleys Technometria

Web 2.0 is Collapsing Under its Own Weight

The overhead of performing even simple tasks online is getting larger and larger. I question the security of almost all these supposedly "secure" messaging systems. And I'm tired of the 'Utopia of Rules' mindset pervasive in every organization. It's exhausting how they expect customers to constantly adapt to their needs.

The overhead of performing even simple tasks online is getting larger and larger. I question the security of almost all these supposedly "secure" messaging systems. And I'm tired of the 'Utopia of Rules' mindset pervasive in every organization. It's exhausting how they expect customers to constantly adapt to their needs.

I don't know if you recall the game Kerplunk. It's a classic children's game that has been around for decades. I remember playing it with my sister. The basic setup involves a transparent plastic tube, a number of sticks, and marbles. The sticks are threaded through the tube to form a web or nest at the bottom on which the marbles rest. We'd take turns removing a stick at a time, trying not to let any marbles fall through the web and out of the tube. At some point, the remaining sticks can't hold the marbles and everything falls down.

Share

The modern web reminds me more and more of a big Kerplunk game and I think the marbles are about to fall. What started out as an easier way to do things like shop, bank, and get health care information has become increasingly complex over time. More and more of the email I receive seems to be simply directing me to log into some bespoke system to retrieve a message or engage in some workflow. And even with a password manager, the act of logging in is often a chore with different user interfaces, custom MFA requirements, and weird rules for passwords. Once you're on the system, session time-outs induce their own form of anxiety since stepping away for a few minutes to attend to something else might require going through the whole Kafkaesque process all over again. The modern web has turned into a dystopian theater of the absurd where even reading a simple appointment reminder from your doctor requires several minutes of stress-inducing interaction with baroque systems and processes.

And it's not just doctors, of course, banks, government agencies, hospitals, ecommerce sites, and customer service systems all adopt these special purpose messaging systems. If you ask these organizations why they use bespoke messaging systems, they'll list things like "timely and improved communication," "convenience," and "privacy and security." But the real reason is that it's more convenient for them because these systems are integrated with their backends and make their processes more manageable. There's certainly nothing about them that's more convenient, timely, or better than email for their customers1.

I also question the privacy and security premise. Email can be insecure. And your email provider can see the contents of your emails. But the messaging system run by your doctor or bank is likely less secure than the email systems run by Apple, Google, and the others. And achieving privacy by making everything incompatible so that you have to use a different system for each correspondent is like chopping off your finger to prevent hangnails.

How did we get here? Bureaucracy. Not just government bureaucracy, but bureaucracy of all kinds. In Utopia of Rules2, David Graeber talks about how power imbalances force the less powerful group to perform what he calls interpretive labor, the work of understanding and implementing what's better or more convenient for the more powerful partner. People are not equal participants in online interactions. We don't have the tools to be fully embodied online3. Because of this we are forced to play by the rules organizations online who are digitally embodied with servers, identity systems, customer management systems, and so on. And part of that is being forced to use their inconvenient and anemic messaging systems.

What's the answer? People need tools. I think digital wallets (a bad name for an important tool), autonomic (peer) identifiers with strong cryptography, and verifiable credentials are a huge step forward. These tools provide the means for people to be peers online rather that mere ghosts in someone else's machine. That's why I insist on using the term self-sovereign rather than decentralized to describe these systems. Cogito Ergo Sum.

Subscribe now

Notes

For a deeper dive into why one-off messaging systems are never as good as email, see Rich Sharing and Personal Channels. Email and other useful messaging systems exhibit a property called rich sharing that makes them much more robust that the simple idea of "sharing a message" would bring to mind.

If you're interested in power imbalances and how they come about, I can't recommend Graeber's book highly enough. He had such a keen understanding of this problem and wrote about it in a way that's both informative and entertaining.

I talk about this in more detail in Chapter 17 of Learning Digital Identity when I discuss authentic digital relationships.

Photo Credit: Playing Kerplunk from DALL-E (public domain) Prompt: Draw a picture of a boy and girl playing kerplunk that's 1200x500 pixels


@_Nat Zone

Googleマイアドセンターを見て自分がどう認識されているか見てみよう〜ちなみにわたしは18〜34歳独身女性になってたw

Googleマイアドセンターは、Googleがユーザーのオンライン行動から収集した情報を基に、ユーザーのプロフィールを推定し、それを本人に開示するためのプラットフォームです。主な特徴は以下の通りです。 マイアドセンターは、Googleがユーザーをどのように理解しているかを本人に開…

Googleマイアドセンターは、Googleがユーザーのオンライン行動から収集した情報を基に、ユーザーのプロフィールを推定し、それを本人に開示するためのプラットフォームです。主な特徴は以下の通りです。

興味関心カテゴリ: Googleがユーザーのウェブ閲覧履歴などから推定した、ユーザーの興味関心のあるカテゴリを表示します。 人口統計情報: 年齢層や性別など、Googleが推定したユーザーの人口統計情報を確認できます。 広告設定: ユーザーは自分のプロフィール情報に基づいたパーソナライズド広告の表示を管理・オプトアウトできます。 情報の透明性: Googleがユーザーをどのように認識しているかを本人が確認できるため、情報収集の透明性が高まります。 プライバシー管理: ユーザーはマイアドセンターから、Googleによる情報収集・利用に関するプライバシー設定を変更できます。

マイアドセンターは、Googleがユーザーをどのように理解しているかを本人に開示することで、情報の透明性を高め、ユーザーのプライバシー管理を促進するためのツールです。ユーザーは自分のプロフィールを確認し、必要に応じて設定を変更することで、パーソナライズド広告などのサービスを自分の好みに合わせて管理できます。

というわけで、早速行って、Googleが膨大な情報をつかってわたしをどのように認識しているかを見てみましょう!みなさんもご一緒に

(図表1)Googleはわたしを女性、18〜34歳、独身 etc とみなしているようだ(出所)Google

どうやら、Googleさんは私のことを18〜34歳女性、独身、大企業所属と認識しているようですw。このくらいの女性って、プライバシーを守るために画面の右上に出ているような老人男性のアバターを使う傾向があるのでしょうか…。なお、正解はアラ還暦既婚男性中小企業所属です。1

みなさんはどう出ましたか?面白いのが出たら、ぜひコメントに書き込んでください。

(図表2)推測に使われているアクティビティ(出所)Google 追伸

ちなみに、Googleがユーザーの興味関心カテゴリを推定する際には、主に以下の情報源が使われているとのことです。

検索履歴: ユーザーがGoogleで検索したキーワードやフレーズから、関心のあるトピックを推定します。 ウェブ閲覧履歴: Googleアナリティクスを導入しているサイトを訪問した際の閲覧履歴から、ユーザーの興味関心を推測します。 YouTubeの視聴履歴: ユーザーがYouTubeで視聴した動画の内容から、関心のあるカテゴリを割り出します。 Googleアカウントの情報: ユーザーがGoogleアカウントに登録した情報(年齢、性別、職業など)も、興味関心の推定に活用されます。 Androidデバイスの使用履歴: Androidスマートフォンやタブレットを使用している場合、インストールしているアプリやアプリの利用状況から興味関心を推測します。 位置情報: ユーザーの位置情報から、よく訪れる場所やその場所に関連するカテゴリを推定します。

これらの情報を機械学習アルゴリズムで分析することで、Googleはユーザーの興味関心カテゴリを自動的に推定しています。ただし、ユーザーがプライバシー設定でこれらの情報の収集を制限している場合は、推定の精度が下がる可能性があります。

また、興味関心カテゴリは定期的に更新されるため、ユーザーの関心の変化に合わせてカテゴリも変化していきます。

しかし、行動からのプロファイリングってうまくいかないものなんですね。わたしの場合はまぁ良いですが、こうした間違ったプロファイリングはプライバシー上の被害を産むこともありますから、きちんとチェックしたいものです。 

Tuesday, 26. March 2024

@_Nat Zone

ナショナルオーストラリア銀行の円卓会議で開会の辞を行いました

3月26日日本時間正午ナショナルオーストラリア銀行のコーポレートデジタルIDラウンドテーブルでオープニングアドレス(開会の辞)を行いました。National Australia Bank (ナショナルオーストラリア銀行)、通称NABはオーストラリア国内最大の資産を持つ都市銀行です…

3月26日日本時間正午ナショナルオーストラリア銀行のコーポレートデジタルIDラウンドテーブルでオープニングアドレス(開会の辞)を行いました。National Australia Bank (ナショナルオーストラリア銀行)、通称NABはオーストラリア国内最大の資産を持つ都市銀行です。

プログラム

コーポレートデジタルIDラウンドテーブルのプログラムは以下のとおりです。(和訳は以下)

Corporate ID in a nutshell

What is the purpose and benefits of Corporate Digital ID? What are the current pain points for ID&V for corporate entities? What are the similar and unique challenges for Corporate Digital ID vis-a-vis Individual Digital ID?   Is a common framework linking Individual Digital ID and Corporate Digital ID necessary/desirable? Why/Why not?

Spotlight on the Global LEI systems (as a potential global and interoperable framework for Corporate Digital ID)

What are the origins of the GLEI, how does it operate and what is its purpose? What are the key adoption challenges and opportunities? A role for global standards? What’s needed for success?

Industry / Government / Cross-border collaboration and partnerships

What is the role of Government/Industry in Corporate Digital ID ecosystems (how does this differ from Individual Digital ID)? What are the opportunities for industry collaboration, public-private partnerships? What are the opportunities for cross border co-operation to promote adoption and interoperability of corporate digital ID systems like the GLEI system? How do we ensure interoperability, accommodate within the broader framework of Digital ID in Australia where this is warranted/desirable

Conclusion and next steps

企業IDの概要

企業デジタルIDの目的とメリットとは? 法人向けID&Vの現在のペインポイントとは? 個人デジタルIDに対する法人デジタルIDの類似点、独自課題とは? 個人デジタルIDと企業デジタルIDをつなぐ共通のフレームワークは必要/望ましいか?なぜ/なぜそうしないのか?

グローバル LEI システムのスポットライト(企業デジタル ID のための潜在的なグローバルで相互 運用可能な枠組みとして)

GLEI の起源、運用方法及び目的は何か? 主要な導入の課題と機会は何か? グローバルスタンダードの役割とは? 成功のために何が必要か?

産業界/政府/国境を越えた協力とパートナーシップ

企業デジタル ID エコシステムにおける政府/産業の役割(個人デジタル ID との違い)とは? 産業界の協力、官民パートナーシップの機会とは? GLEI システムのような企業デジタル ID システムの採用と相互運用性を促進するための 国境を越えた協力の機会とは? 相互運用性を確保し、オーストラリアにおけるデジタル ID の広範な枠組みの中で相互 運用性が保証される/望ましい場合には、どのように対応するか? 背景文献

このラウンドテーブルの背景文献としてはDouglas Arner 他による国際決済銀行(BIS)の論文 「 Corporate digital identity: no silver bullet, but a silver lining」が指定されていました。なお、Douglas Arner氏も本ラウンドテーブルに参加していました。

スピーチ

これを受けて、オープニングアドレスとしてわたしは以下のようなスピーチをしています(和訳はこの後にあります)。

Introduction:

Thank you for the introduction and thank you for inviting me to this roundtable. It is my honour to be with you all. Let me provide a few remarks on Corporate Digital Identity. 

Corporate digital identity, or corporate ID, has the potential to dramatically simplify the identification and verification of companies, reducing the risks and costs of doing business. It can act as an admission ticket for companies to access financial services more efficiently. But developing effective corporate ID systems requires addressing several key challenges. I have enumerated seven of them for today. 

Key points: Corporate ID is more complex than individual ID. A company’s attributes like directors and ownership structure can change frequently and span multiple jurisdictions. Identifying ultimate beneficial owners is a particular challenge.  Company registries play a foundational role as the authoritative source of core company data. But many registries need to enhance data openness, quality, depth and connectivity to better support corporate ID. Often, data do not get updated in a timely manner and we would be looking at stale data.  With Bank-related initiatives: Banks have an opportunity to monetise their KYC investments by providing corporate ID services, thus turning a cost centre into a profit centre. The Open Digital Trust Initiative, jointly launched by the Institute of International Finance and the OpenID Foundation is such an initiative, but face obstacles around cost, data sharing, liability and competition. KYC utilities offer potential but have seen mixed results so far. The Legal Entity Identifier (LEI) provides a global, unique, and interoperable identifier as a starting point for corporate ID. Enhancements like the verifiable LEI and LEI embedded in digital certificates could expand benefits and adoption. However, coverage of the registered companies is yet to be improved. Also, vLEI being built on a completely different technical stack than other verifiable credentials systems may pose adoption challenges.  OpenID, an open standard and decentralized authentication protocol, and its extension “OpenID for Identity Assurance”, which expresses the provenance and quality of the attributes, have significant potential for enabling secure and trusted identity assurance and data sharing between companies and service providers. The Global Assured Identity Network (GAIN) initiative aims to build on banks’ existing KYC processes and the OpenID standard to create a global, interoperable corporate ID and authentication system. It is like building bridges among islands of ecosystems. Interoperability among different ecosystems has been demonstrated through a technical proof of concept, which was led by Dima Postonikov in the Sydney room, but business and operational reality must catch up before it becomes ready to take off.  Decentralized identifiers and verifiable credential models, sometimes built on blockchain, aim to give companies more control over their data while enabling trusted data sharing. However, significant infrastructure investment and maturity are still needed.  Some governments are proactively developing corporate ID infrastructure as a public good, such as the account aggregator framework in India which empowers SMEs to digitally share their data for better access to finance. While it has found tractions in these economies, whether the pattern will propagate to other parts of the world is yet to be determined.  Conclusion:

In summary, corporate ID is progressing but remains fragmented today. There is no silver bullet – a range of stakeholders including registries, banks, service providers and policymakers have important roles to play. Enhancements to the LEI system, OpenID-based initiatives like GAIN, decentralized identity standards, and public infrastructure could help accelerate the development of a trusted and inclusive corporate ID ecosystem supporting improved financial stability, integrity, and access. Political will and multi-stakeholder coordination is essential to realizing the full potential.

I am hopeful that I will find hints to some of the key challenges that I have cited in today’s roundtable.  

以下、自動翻訳です。

はじめに:

この円卓会議にお招きいただき、ありがとうございます。皆さんとご一緒できて光栄です。コーポレート・デジタル・アイデンティティについて少し述べさせていただきます。

コーポレート・デジタル・アイデンティティ、すなわち企業IDは、企業の識別と確認を劇的に簡素化し、ビジネスを行う上でのリスクとコストを削減する可能性を秘めています。企業が金融サービスをより効率的に利用するための入場券として機能します。しかし、効果的な企業IDシステムを開発するには、いくつかの重要な課題に取り組む必要があります。今日はそのうちの7つを列挙してみました。

キーポイント 法人IDは個人IDよりも複雑です。役員や所有形態など企業の属性は頻繁に変更され、複数の法域にまたがる可能性があります。最終的な受益者の特定は特に難しい課題です。 企業レジストリは、企業データの権威ある情報源として中核的な基礎的な役割を果たしています。しかし、多くのレジストリは、企業IDをよりよくサポートするために、データの公開性、質、深さ、および接続性を強化する必要があります。多くの場合、データはタイムリーに更新されず、私たちは古いデータを見ることになります。 銀行関連のイニシアティブ 銀行は企業IDサービスを提供することで、KYCへの投資を収益化し、コスト・センターをプロフィット・センターに変えるチャンスがあります。国際金融研究所(Institute of International Finance)とOpenIDファウンデーション(OpenID Foundation)が共同で立ち上げたオープン・デジタル・トラスト・イニシアチブ(Open Digital Trust Initiative)はそのようなイニシアチブですが、コスト、データ共有、責任、競争などの面で障害に直面しています。KYCユーティリティは可能性を秘めていますが、これまでのところ結果はまちまちです。 取引主体識別子(LEI)は、企業 ID の出発点として、グローバルで一意かつ相互運用可能な 識別子を提供。検証可能な LEI や電子証明書に組み込まれた LEI のような機能強化は、便益と採用を拡大する可能性がある。しかし、登録企業のカバレッジはまだ改善されていない。また、vLEI は他の検証可能なクレデンシャル・システムとは全く異なる技術スタック上に構築され ていることから、採用の課題が生じる可能性がある。 オープンスタンダードで分散型の認証プロトコルである OpenID と、その拡張機能である「OpenID for Identity Assurance」は、属性の実証性と品質を表現するものであり、企業とサービスプロバイダ間で安全で信頼できる ID 保証とデータ共有を可能にする大きな可能性を秘めています。グローバル・アシュアード・アイデンティティ・ネットワーク(GAIN)イニシアチブは、銀行の既存のKYCプロセスとOpenID標準を基盤として、グローバルで相互運用可能な企業IDおよび認証システムを構築することを目指しています。これは、エコシステムの島々の間に橋を架けるようなものです。異なるエコシステム間の相互運用性は、シドニーの部屋で参加しているディマ・ポストニコフが主導した技術的な概念実証を通じて実証されていますが、ビジネスと運用の現実は、それが離陸する準備が整う前に追いつかなければなりません。 非中央集権的な識別子と検証可能なクレデンシャル・モデルは、時にはブロックチェーン上に構築され、信頼できるデータ共有を可能にしながら、企業がデータをよりコントロールできるようにすることを目指しています。しかし、インフラへの多大な投資と成熟がまだ必要です。 インドのアカウント・アグリゲーター・フレームワークのように、公共財としての企業IDインフラを積極的に開発している政府もあります。このような経済圏では牽引役となっていますが、このパターンが世界の他の地域に伝播するかどうかはまだわかりません。 結論:

要約すると、企業IDは進歩していますが、現在も断片的なままです。レジストリ、銀行、サービス・プロバイダー及び政策立案者を含む様々な利害関係者 が果たすべき重要な役割。LEI システムの強化、GAIN のようなオープン ID ベースのイニシ アティブ、分散化された ID 標準、及び公共インフラは、金融の安定性、完全性、及びアクセスの改 善を支援する信頼され包括的な企業 ID エコシステムの開発を加速するのに役立つ可能性があ ります。政治的な意志とマルチステークホルダーの協調が、その可能性を完全に実現するために不可欠です。

本日の円卓会議で私が挙げた重要な課題のいくつかにヒントが見つかることを期待しています。

Monday, 25. March 2024

Just a Theory

Mini Summit Two

A rough transcript my Ian Stanton’s talk, “Building Trunk: A Postgres Extension Registry and CLI”, along with interspersed chat comments.

We had such thoughtful and engaged discussion at this week’s Postgres Extension Ecosystem Mini-Summit! I did learn that one has to reserve a spot for each mini-summit individually, however. Eventbrite sends reminders for each one you sign up for, not all of them.

To reserve a spot and be reminded for forthcoming meetings, hit the Eventbrite page and select a date and hit “Reserve a Spot” for each date you’d like to attend.

Back to this week’s meetup. My colleague Ian Stanton of Tembo gave a great talk, “Building Trunk: A Postgres Extension Registry and CLI”, that provided background on the motivations and problems that inspired the creation of trunk, a binary packaging system for Postgres extensions.

The presentation was followed by 35+ minutes of questions, discussion, and brainstorming, which I’ve summarized below. But first, links!

Video PDF Slides

Now down to business.

Introduction I opened the meeting and introduced Ian Stanton. Presentation

Ian introduced himself and trunk, “an extension registry and CLI supporting Tembo Cloud. Wants to tell a story, starting with backstory.

Tembo founded November 2022, provide managed Postgres solution called Tembo Cloud. Idea is Postgres can be used for so many different things through the power of extensions, so built use-case optimized “stacks” flavors of Postgres powered by extensions and configurations. Super proud of them, including Message Queue, for which we build an open-source extension.

Envisioned ability to install any extension, including user-provided extensions. Knew we’d need an extension management solution. So we built it.

It’s called trunk, an extension registry and CLI, an open-source app for the community that hosts binary packages for extensions, and powers Tembo Cloud’s extension management.

Q1 2023 had build Tembo CLoud v1 with all extensions bundled in containers. But wanted way to install them on the fly, ideally with installable packages. Explored the ecosystem for tool we could use.

PGXN first we found. Love it, backed by the community, been around since 2011, but hosted source code, not binaries. Also little development since 2012.

Apt and Yum repositories are community-backed and are binaries, just what we wanted, but smaller subset of extensions relative to the 1000s available. Thought it would be too time-consuming to add them all through the community process.

*Steven Miller: in chat: “Also with apt packaging, it requires to install to a specific path, but we needed to customize the install path based on what pg_config shows for share lib and package lib dir. That way we could persist extension installations on tembo cloud”

Weighed pros and cons of building one. Pros:

Full control over integration with Tembo Cloud Binary distribution We could build new features quickly We could publish new extensions quickly

Cons:

How will the community react? Recreating the wheel?

Expected to publish 2–3 extension a day, only do-able with a solution we built.

Want to build something meaningful for Tembo Cloud and the community.

Astronomer Registry for Airflow: Built by Astronomer to find modules for Airflow, very well received by the community.

PGXN, Apt, and Yum repos: Wanted to take the best of them and build on it.

crates.io: Wanted a similar great experience for Postgres extensions.

Vision boiled down to discoverability, categories, ratings system, certification, and indexing of cloud provider support.

Want to package any extension, whether SQL, C/SQL, or pgrx.

Simple experience, like cargo publish and cargo install cargo-pgrx.

Eric in chat: “❤️”

Hopes and Dreams: had idea people would magically show up, contribute to the code, and publish their extensions. Wanted to support multiple platforms, architectures, and Postgres versions, and for it to be a one-stop shop for Postgres extensions.

How it works.

CLI and Registry, written in Rust, uses Docker to build extensions. Packages named <trunk-project-name>-<version>-<pg-version>.tar.gz. Published with trunk publish and installed with trunk install, putting all the files in the right places.

Steven Miller in chat: “The approach to use docker for building has been nice. It allows for cross-compile, for example, building for any platform docker supports with the —platform flag”

Registry stores metadata and service web site and API, and uses S3 bucket for the tar-gzip files.

Example building semver extension:

Create Trunk bundle: bitcode/src/semver/src/semver.bc bitcode/src/semver.index.bc semver.so licenses/LICENSE extension/semver--0.10.0--0.11.0.sql extension/semver--0.11.0--0.12.0.sql extension/semver--0.12.0--0.13.0.sql extension/semver--0.13.0--0.15.0.sql extension/semver--0.15.0--0.16.0.sql extension/semver--0.16.0--0.17.0.sql extension/semver--0.17.0--0.20.0.sql extension/semver--0.2.1--0.2.4.sql extension/semver--0.2.4--0.3.0.sql extension/semver--0.20.0--0.21.0.sql extension/semver--0.21.0--0.22.0.sql extension/semver--0.22.0--0.30.0.sql extension/semver--0.3.0--0.4.0.sql extension/semver--0.30.0--0.31.0.sql extension/semver--0.31.0--0.31.1.sql extension/semver--0.31.1--0.31.2.sql extension/semver--0.31.2--0.32.0.sql extension/semver--0.32.1.sql extension/semver--0.5.0--0.10.0.sql extension/semver--unpackaged--0.2.1. sql extension/semver.control extension/semver.sql manifest. json Packaged to •/. trunk/pg_semver-0.32.1-pg15.tar.gz

Package up SQL files, control file, SO files, bitcode files into gzip file.

Once it’s published, API surfaces all this information:

[ { "name": "pg_semver", "description": "A semantic version data type for PostgreSQL.", "documentation_link": "https://github.com/theory/pg-semver", "repository_link": "https://github.com/theory/pg-semver", "version": "0.32.0", "postgres_versions": [ 15 ], "extensions": [ { "extension_name": "semver", "version": "0.32.0", "trunk_project_name": "pg_semver", "dependencies_extension_names": null, "loadable_libraries": null, "configurations": null, "control_file": { "absent": false, "content": "" } } ], "downloads": [ { "link": "https://cdb-plat-use1-prod-pgtrunkio.s3.amazonaws.com/extensions/pg_semver/pg_semver-pg15-0.32.0.tar.gz", "pg_version": 15, "platform": "linux/amd64", "sha256": "016249a3aeec1dc431fe14b2cb3c252b76f07133ea5954e2372f1a9f2178091b" } ] }, { "name": "pg_semver", "description": "A semantic version data type for PostgreSQL.", "documentation_link": "https://github.com/theory/pg-semver", "repository_link": "https://github.com/theory/pg-semver", "version": "0.32.1", "postgres_versions": [ 15, 14, 16 ], "extensions": [ { "extension_name": "semver", "version": "0.32.1", "trunk_project_name": "pg_semver", "dependencies_extension_names": null, "loadable_libraries": null, "configurations": null, "control_file": { "absent": false, "content": "# semver extension\ncomment = 'Semantic version data type'\ndefault_version = '0.32.1'\nmodule_pathname = '$libdir/semver'\nrelocatable = true\n" } } ], "downloads": [ { "link": "https://cdb-plat-use1-prod-pgtrunkio.s3.amazonaws.com/extensions/pg_semver/pg_semver-pg14-0.32.1.tar.gz", "pg_version": 14, "platform": "linux/amd64", "sha256": "f412cfb4722eac32a38dbcc7cd4201d95f07fd88b7abc623cd84c77aecc8d4bb" }, { "link": "https://cdb-plat-use1-prod-pgtrunkio.s3.amazonaws.com/extensions/pg_semver/pg_semver-pg15-0.32.1.tar.gz", "pg_version": 15, "platform": "linux/amd64", "sha256": "9213771ffc44fb5a88726770f88fd13e62118b0f861e23271c3eeee427a23be9" }, { "link": "https://cdb-plat-use1-prod-pgtrunkio.s3.amazonaws.com/extensions/pg_semver/pg_semver-pg16-0.32.1.tar.gz", "pg_version": 16, "platform": "linux/amd64", "sha256": "8ffe4fa491f13a1764580d274e9f9909af4461aacbeb15857ab2fa235b152117" } ] } ]

Includes different tar-gzip files for different versions of Postgres, the contents of the control file, dependencies; loadable libraries and configurations; and the one extension in this package — some can have many like PostGIS. Then Postgres version support and some other metadata.

What it looks like on the web site, includes README contents, data from the last slide, install command, etc.

This is what installation looks like:

$ trunk install pg_semver Using pkglibdir: "/usr/lib/postgresql/16/lib" Using sharedir: "/usr/share/postgresql/16" Using Postgres version: 16 info: Downloading from: https://cdb-plat-usel-prod-pgtrunkio.s3.amazonaws.com/extensions/pg_semver/pg_semver-pg16-0.32.1.tar.gz info: Dependent extensions to be installed: [] info: Installing pg_semver 0.32.1 [+] bitcode/src/semver/src/semver.bc => /usr/lib/postgresql/16/lib [+] bitcode/src/semver. index.bc => /usr/lib/postgresql/16/lib [+] semver.so => /usr/lib/postgresql/16/lib info: Skipping license file licenses/LICENSE [+] extension/semver--0.10.0--0.11.0.sql => /usr/share/postgresql/16 [+] extension/semver--0.11.0--0.12.0.sql => /usr/share/postgresql/16 [+] extension/semver--0.12.0--0.13.0.sql => /usr/share/postgresql/16 [+] extension/semver--0.13.0--0.15.0.sql => /usr/share/postgresql/16 [+] extension/semver--0.15.0--0.16.0.sql = /usr/share/postgresql/16 [+] extension/semver--0.16.0--0.17.0.sql => /us/share/postgresql/16 [+] extension/semver--0.17.0--0.20.0.sql => /usr/share/postgresql/16 [+] extension/semver--0.2.1--0.2.4.sql => /usr/share/postgresql/16 [+] extension/semver--0.2.4--0.3.0.sql > /us/share/postgresql/16 [+] extension/semver--0.20.0--0.21.0.sql => /usr/share/postgresql/16 [+] extension/semver--0.21.0--0.22.0.sql => /usr/share/postgresql/16 [+] extension/semver--0.22.0--0.30.0.sql => /usr/share/postgresql/16 [+] extension/semver--0.3.0--0.4.0.sql => /usr/share/postgresql/16 [+] extension/semver--0.30.0--0.31.0.sql = /usr/share/postgresql/16 [+] extension/semver--0.31.0--0.31.1.sql => /usr/share/postgresql/16 [+] extension/semver--0.31.1--0.31.2.sql => /usr/share/postgresql/16 [+] extension/semver--0.31.2--0.32.0.sql => /usr/share/postgresql/16 [+] extension/semver--0.32.1.sql => /usr/share/postgresql/16 [+] extension/semver--0.5.0--0.10.0.sql => /usr/share/postgresql/16 [+] extension/semver--unpackaged--0.2.1.sql => /usr/share/postgresql/16 [+] extension/semver.control => /usr/share/postgresql/16 [+] extension/semver.sql => /usr/share/postgresql/16 *************************** * POST INSTALLATION STEPS * *************************** Install the following system-level dependencies: On systems using apt: libc6 Enable the extension with: CREATE EXTENSION IF NOT EXISTS semver CASCADE;

CLI pulls down the tar-gzip, unpacks it, and puts the files in the right places and tells the users what other commands are needed to enable the extension.

Pause to take a sip of water.

David Wheeler (he/him) in chat: “STAY HYDRATED PEOPLE!”

State of the project. Trunk powers extension management for Tembo Cloud, 200 extensions on the platform, install and enable on the fly. Tembo Cloud likely trunk’s #1 user.

Get lots of site traffic, especially around categorization, addresses the discoverability problem set set out to solve.

Jeremy S in chat: “Interested in insights from site traffic - you mentioned that ‘categorization’ was popular - any other things that traffic patterns seem to suggest you might have done really well, or clearly is needed?”

But pretty minimal community involvement, out fault for not involving the community early on.

Did we solve the problem?

For Tembo Cloud: yes! Trunk is core component of the Tembo Cloud platform that lest us offer high number of extensions. For the community: no! But helped bring more awareness to the opportunities to improve the ecosystem as a community. Saw other solutions arise around the same time, including dbdev and pgxman, and Yurri at Omnigres is working on something as well. Huge opportunity to solve this together. Steven Miller in chat: “I think it is very nice way to install other extensions via an extension how dbdev works” David Wheeler (he/him) in chat: “GRANT!” Grant Holly in chat: “Oh hi”

Lessons Learned

It’s a really hard problem to solve! As you add more layers of complexity, like different architectures, versions of Postgres, it gets harder and harder.

Steven Miller in chat, Replying to “The approach to use …”: “The downside for this approach includes missing chip-specific instruction support, for example AVX512, which optimizes performance on some extensions. However if you are building with docker on the same architecture as the host, then it still includes these instructions.”

David Wheeler (he/him) in chat, Replying to “The approach to use …” “Also presumably no support for building for non-Linux platforms, yes?”

The extension ecosystem truly is the wild west, not really best practices around building, versioning, and releasing, and when you’re collecting and housing them, it makes things difficult. A huge opportunity for us to come up with those standards and share them with the community.

Community involvement is crucial, wish we’d done it better early on, that’s why we’re all here today! Solution to build together doesn’t happen if we don’t tackle it as a community.

Similarly, wish we’d reached out to folks like David and Devrim early on, to get more insight from them and bring the community into the project from the beginning

The future of trunk

Registry and CLI will continue to serve Tembo Cloud

Has paved the way for binary packaging and distribution in PGXN v2 that David is spearheading, will at least will inform and perhaps be the basis for that part of the project.

That’s all, thank you, back to you, David!

Discussion

David Wheeler (he/him): Thanks for history an context, Ian! Questions or discussion topics? Some comments in the thread from Steven and Tobias.

Tobias Bussmann: in chat: speaking of paths: it would be super helpful if postgresql would support loading extensions from additional paths beside the $SHAREDIR/extension and $PKGLIBDIR directories. At least following directory symlinks within…

Steven Miller in chat, Replying to “The approach to use …”: I tried to make it work for Mac, for example, but the docker support didn’t work basically. I think it might work for Windows, since they have better container support. However I didn’t try that yet.

David Wheeler (he/him): Reads Tobias’s comment. You can specify a subdirectory in the sharedir and maybe the moduledir? But it’s a little hinky right now. Steve, do you want to talk about the us of Docker to build images?

Steven Miller: Yeah, I’d love to. To Tobias’s point, agree, on Tembo Cloud, we have a persistent directory where wer’re sintalling extensions, but because there is no way for an extra sharedir or package dir, we’re persisting all of the library files, including Postgres core. Not ideal, especially for upgrades.

Approach for building ind Docker: been nice, do the build ina Dockerfile, start the container, then install and compare the difference between layers and zip up all the new files. Great for cross-compile but, not working for mac or other systems. Will need a fallback option to do a local build.

Jeremy S: in chat, Replying to “speaking of paths: i…”: Exactly same point was also mentioned just this morning on slack by Matthias

David Wheeler (he/him): Makes sense, thanks. What other bits do you feel like could be useful for packaging binaries at a community level?

Steven Miller: Sometimes we install binaries with trunk, but then difficult to know what has been installed. Nothing like apt where there is a history of what is installed or uninstall. Would be nice to do something like trunk list and see everything that has been installed. Also, future should be not just install but management, including turning extensions on, and there are a lot of ways to turn them on.

Ian Stanton: uninstall would be useful, too.

David Wheeler (he/him): Other questions about trunk or challenges to binary distribution it brings?

*Tobias Bussmann in chat, Replying to “speaking of paths: i…”: this would allow an immutable PostgreSQL base package and still allow to install extensions on top. This is esp. important if you need to have singned packages like on macOS

*nils in chat, Replying to “speaking of paths: i…”: Guess there is some prior art in how search_path in pg work, or the PATH in unix’s.

Should be doable to allow to specify some kind of ordered search path, where Postgres will look for extensions. That way, Postgres can protect it’s own libs to no be overwritten by external libs, but allow for loading them from extra paths.

Yurri: There is CREATE EXTENSION and other extensions like logical decoding plugins. Does trunk handle them?

Steven Miller: We think of it as four types extensions into 2x2 matrix: 1. Does it require CREATE EXTENSION true or false; and 2. Does it have a loadable library true or false. The false/false category is output plugins; The true/true category, e.g. pg_partman, pg_cron; CREATE EXTENSION false and loadable library true, e.g., autoexplain, just a library, no upgrade concerns; and then CREATE EXTENSION true and loadable library false is the default case.

Ian Stanton: Steven wrote a blog on this.

Eric in chat: Does pgrx make the process of building easier or harder and is there anything we can do today to make rust extension building better?

Jason Petersen in chat: Yeah, it sounds like we need some sort of system database like apt has; would enable management, uninstall, version list, whether upgrades are available, etc

Yurri: That would be great. What other modules are there without extensions, like autoexplain?

Ian Stanton: auth delay is another, base backup to shell, considered parts of postgres, but we have trouble categorizing them. There are 10-15 I’ve come across.

Yurri: ARe these categories on Tembo, can you click a button?

Ian Stanton: Not a category, but would be a good one to add.

Steven Miller in chat: This one! https://tembo.io/blog/four-types-of-extensions

It’s in the API metadata

Sorry if I mispronounced your name Tobias

David Wheeler (he/him) in chat: SAME

Yurri: Did you say output plugins are handled with Tembo:

Steven Miller: YOu can install them with trunk, yes.

Yurri: And you have the build pipeline that will work without plugins too, yeah, cool.

David Wheeler (he/him): Tobias, did you want to say more about the path issues?

Tobias Bussmann: Sure! We are building the Postgres.app, distribution for macOS, working different from Linux systems. We distribute some extensions directly, but also allow building and installing extensions on it. Works nicely, even with pgxn client, but it’s built within the application, which breaks the code signature.

We always have to fight against a breaking system to allow that. Possible, but would be much cleaner to specify an extra directory where extensions could be loaded, and we could distribute packages with binary extensions that the user could download and install separately from the Postgres.app.

David Wheeler (he/him): You’re not suggesting a different directory for every extension with a module, but just another path in the search path that’s not subject to the signature verification.

Tobias Bussmann: Yes, that would be an option, but with a flexible system could use one per extension or just specify a second directory. Contrib extensions sometimes seen as part of Postgres, and they’re all stuffed in the same directory with third party extensions, which gets confusing and hard to manage.

Steven Miller in chat: In the previous extensions mini summit, Yuri mentioned that he was working on a patch to postgres for extra libdir, extra share dir, but I have not been tracking this one

nils: That’s what I was saying in chat, there is prior art in Postgres and Unix systems where you can specify a search path in postgres for a list of schemas, and in Unix the path is to find binaries. Give me a delimited list of directories on my system. Could be super user only, where they can specify where they’re installed, and we can go through the list ot find an extension.

David Wheeler (he/him): I might be imagining this, but I seem to recall there was a proposal to have extensions in their own directories, which would be nice for packaging, but then every time you add one you have to add another directory to the list and there is some fear the lookup time could be too long.

Jeremy S in chat, replying to “speaking of paths: i…”: (Or like LD_LIBRARY_PATH )

David Wheeler (he/him) in chat, replying to “speaking of paths: i…”: LD_LIBRARY_PATH is all but dead on macOS

Jason Petersen: If it happens at startup I can’t imagine that being a concern. If the list changes you reboot. It’s not gonna be a performance problem, I hope.

*David Wheeler (he/him): Or HUP it if you don’t want downtime.

Jason Petersen: Sure, but it doesn’t need to be on every command.

*David Wheeler (he/him): Eric, do you want to pose your question about pgrx?

Eric: Sure. Wanted to know, were there stumbling blocks to get pgrx support built into trunk, and does it make things easy or difficult? Different from C path, are there things we could do to make things easier today?

Ian Stanton: Yeah, I think the issue is mostly on our end. We have a separate image for each version of pgrx, and keeping up with the releases is challenging. We need to rethink our image building strategy. Shouldn’t be one image for each version of pgrx. That’s the biggest thing I’ve noticed, mostly on our side.

*David Wheeler (he/him): Because you need the install the version of pgrx that the extension requires before you do the build, and that’s just too slow?

Ian Stanton: Could be too slow. We’ve known about this problem for some time, just hasn’t been addressed yet.

Eric: Okay, maybe we can talk about it offline one day, be happy to chat. I think we’re close to being able to have the CLI, cargo-pgrx, be a different version than whatever version the extension uses.

Ian Stanton: That would be super useful!

Eric: Yeah, I think we’re close to being at that point, if not there already. We can talk about that offline.

Ian Stanton: Nice! We’ll reach out in Discord.

*David Wheeler (he/him): Other comments or questions, or people who have worked on other kinds of binary registry things, would love to hear more from other perspectives. Devrim is going to talk about the Yum repository next week [ed. correction: in two weeks].

Steven Miller in chat: Daniele last time mentioned Pip is good example of mixing source and binary distributions

Eric: I have a random question related to this. In the past and recent history, has hackers talked about some way of storing extension in the database rather than relying on the file system?

*David Wheeler (he/him): Yes! In this long thread from 2011 [ed. Correction: 2013] Dimitri was proposing a “unit”, a placeholder name, where the object would be stored in the database. Very long thread, I didn’t read the whole thing, lot of security challenges with it. If it needs a shared object library loading having to be written to the file system it’s just not going to happen. I don’t know whether that’d be required or not.

Dimitri also worked on a project called pginstall where you could install extensions from the database like dbdev, but not just TLEs, but anything. The idea is a build farm would build binaries and the function in the database would go to the registry and pull down the binaries and put them in the right places on the file system.

There were a lot of interesting ideas floating around, but because of the legacy of the PGXS stuff, it has always been a bit of a struggle to decide not to use it, to support something not just on the machine, but do something over libpq or in SQL. Lot of talk, not a lot of action.

*Tobias Bussmann in chat in response to “In the previous ex…”: still searching on hacker for it. Meanwhile I found: https://commitfest.postgresql.org/5/170/

Steven Miller in chat: That approach is very awesome (install via extension)

Eric: I can see why it would take some time to sort it all out. One thing to require super user privileges to create an extension, but also having root on the box itself? Yeah.

Yurri: TLE plugs into that a little bit for a non-shared object. Not exactly storing it in the database, but does provide a SQL based/function method of installing from inside the database, but only for trusted languages, not shared objects.

*David Wheeler (he/him): dbdev install does download it from database.dev and stores it in the database, and has hooks into the CREATE EXTENSION command and pulls it out of its own catalog. Was a similar model with pginstall, but with binary support, too.

Yurri: Back to trunk. When you start building, and have to deal with binaries, pgxn you can put the source up there, but I want to get to the whole matrix of all the different versions. Every extension author does it a little different. Some extensions have versions for Postgres 15, another for 14, some have the same version across all the majors, sometimes an extension works for some majors and others. Has trunk expanded to other Postgres versions to support the whole exploding matrix of stuff that does and doesn’t work, 5-6 majors, gets to be a large matrix, a lot to keep track of. How’s that working out for the builds and managing that matrix.

Steven Miller in chat: Dimensions I think are: pg version architecture chip-specific instructions (edge case for native builds?)

Steven Miller in chat: We just announced support for 14 and 16

David Wheeler (he/him) in chat, replying to “Dimensions I think a…”: OS, OS version

Steven Miller in chat,: Replying to “Dimensions I think a…”: Ah right

Ian Stanton: Steven do you want to take that one?

Steven Miller: Oh yeah. We’ve started toe-dipping on this one. Started with Tembo Cloud’s platform, but have no released Postgres 14 and 16, and also trunk has built-in support for other architectures, such as arm, or whatever the Docker --platform flag supports. We looked at mac builds, not working yet, might work for Windows, which ahs better container support, but I don’t know, and also there is an edge case for pg_vector especially, which compiles to include ship-specific instructions for AVX512, which helps with vector. So that’s another dimension to consider.

Yurri: Part of the idea behind this forum is to see if we can chart a path forward, maybe not solve everything. What can we solve, how can we make something a little better for Postgres at large?

Eric in chat: Even as a Mac user I don’t know the answer to this… what’s the common Postgres package there? Postgres dot app, homebrew, something else?

David Wheeler (he/him) in chat: pgenv! (self-promotion)

Eric in chat: I assume folks don’t use macOS in prod but developers are important too

nils in chat, Replying to “Even as a Mac user I…”:

$ git clone .. $ ./configure $ make $ make install

At least that is what I do 😄

Steven Miller: In my opinion, the way to approach it is to know all the dimensions you need, and in the metadata API say which binaries are available. Then get through it with testing and badging If we let things get built, to what extent is it tested and used? That can help. Daniele was in the previous call, said we could look to Pip and Wheel files for inspiration, and Adam on our team has said the same. This is something that has some binary and some source, and falls back on doing the build when it needs to.

*David Wheeler (he/him): I’ve been thinking about this quite a bit lately. Can see needing to take advantage of multiple platforms available through GitHub workflow nodes or the community’s build farm, which has a vast array of different architectures and platforms to build stuff. There are precedents!

I imagine a system where, when something is published on PGXN, another system is notified and queues it up to all its build farm members to build binaries, ideally without full paths like trunk, and making them available for those platforms. Building out that infrastructure will take a fair bit of effort, I think. With cross-compiling is available it might be…doable? But most modules and for SQL and maybe Rust or Go extensions, but a challenge for C extensions.

This is a problem I’d like us to solve in the next year or two.

Steven Miller in chat, replying to “I assume folks don’t…”: Yeah exactly, like trunk install after brew install postgres

Tobias Bussmann in chat, replying to “Even as a Mac user…”: this seems to be quite spread. There are also people that prefer docker based installs

Eric in chat: pgrx supports cross compilation

With a caveat or two!

Eric in chat, replying to “Even as a Mac user I…” @nils same. For v9.3 though 16!

*David Wheeler (he/him): What else? Reading the comments.

Yurri: I think maybe that PGXN JSON file, I know you’ve been spending time on it, David, including the proposal on namespacing a few days ago. That feels like it could be helpful to be part of this. IF it could be something we could center around… The first time I wanted to put an extension on PGXN, it took me a long time to figure out that JSON file. I didn’t find the blog post that goes through it in nice detail till like two weeks after. If I’d found it sooner I could have skipped so many things I tried to figure out on my own.

If we can center around that file, it’ll draw more attention to it, more links back to it, more examples people blog about here and there, it helps going forward. The trick is getting it right not being this massive thing no one can figure out, or has too many options, but hits all the points we need.

nils in chat, replying to “Even as a Mac user I…”: Well, mostly for extension, for Postgres I rely on David’s pgenv

*Eric * in chat, replying to “Even as a Mac user I…”: @Tobias Bussmann hmm. Makes it difficult to get an extension installed.

*David Wheeler (he/him): I’ve been thinking about this a lot, drafted a doc some of my colleagues at Tembo have read over and I hope to publish soon [ed. Note: now published], thinking through what a v2 of the PGXN Meta Spec might include. I think we should extend with list of external libraries required, or the architectures it supports, or it’s a loadable library or an app that doesn’t even go into the database.

I would like soon to draft an actual revision of the spec, and document it well but also turn it into a JSON Schema document so we can automate publishing it and verification in the same place. I also imagine building an eventual replacement or evolution of the PGXN client or trunk client or some client that you can use to manage that thing. I think pgrx does that, adding metadata via the client rather than parse and understand the whole file.

I’m with you it could get really complicated, but I’m not sure I see an alternative other than building good tooling to minimize the pain.

Ian Stanton: I think automatically pulling that information when it’s readily available would be super helpful. We use it as an app to just take care of things for people.

*David Wheeler (he/him): Right, and then if we’re successful in getting it done it’s getting people to take up the tools and start using them. There’s only so much we can infer. I can tell how to do a build if there’s a Makefile or a configure file or a cargo.toml, but that doesn’t reveal what libraries are required. This is why there’s a lot of hand-tuning of RPM and Apt spec files.

Steven Miller in chat: We are calling this “system dependencies”

Ssl and glibc the main ones 🙂

Jason Petersen in chat: And sometimes the package names aren’t even 1—1 mappings

Eric in chat: Ha! Try relying on elasticsearch as a runtime dependency! 😞

Yurri: That’s another thing to touch on. A lot of extensions are just a thin layer of glue between Postgres and some OSS library that someone else maintains. But the trick, when you want to build a Yum package, the dependency has a different name than the rest of the RedHat ecosystem vs. the Debian ecosystem. So part of what Devrim has to do to maintain the RPM packages is manually sort all that out, because you can’t automatically… libc! It’s called glibc in RedHat and just libc in Debian, and every package has slightly different names. Do how do you manage that in trunk? Do you pull the source for any dependencies? Does your Docker image…I don’t know how this is working.

David Wheeler (he/him) in chat: I want to build a json schema validation extension in Rust using https://github.com/Stranger6667/jsonschema-rs or something

Tobias Bussmann in chat, replying to “Ha! Try relying o…”: or V8 🤯

Ian Stanton: Two sides to that one is build time dependencies, and there there are runtime dependencies. I just dropped an example for some random extension. Tthe way we’ve been building this is to write out a Dockerfile that can include build time dependencies. [hunts for link…]

Ian Stanton in chat: https://github.com/tembo-io/trunk/blob/main/contrib/age/Dockerfile

Ian Stanton: We specify them all there. But for runtime, we don’t know what’s required until we test the thing. We have stuff in our CI pipelines to install and enable the extension to see if it works. If it doesn’t, it will report a missing dependency. Then we know we need to add it to our Postgres images. Not the best flow for finding these dependencies. Steven, want to add anything more to the build time dependency piece?

David Wheeler (he/him) in chat, replying to “Ha! Try relying on …”: Next version of plv8 released on PGXN will have v8 bundled

Steven Miller: A lot share the same ones, SSL and glibc, so we just build with the same versions we run on Tembo Cloud. In the metadata we list all system dependencies, that’s what we build towards, and include them in the Docker image. If you pick a different stack, like the Machine Learning stack, it has all the Python stuff in the base image. We don’t really love this, but this is something where Python wheel might inspire us, becaus it has packaging and system dependencies.

Eric in chat, replying to “I want to build a js…”: I feel like I’ve seen one already?

David Wheeler (he/him) in chat, replying to “I want to build a js…”: GIMME

Yurri: If you really want to od this right, just like in the RPM repositories, you have to know what the dependencies are. David, I’m curious, what your thoughts are, if this is to be done right, there has to be a way to indicate dependencies in the META.json file, but then I’m talking about Debian and RedHat, but what about Mac? Windows doesn’t really have a packaging system. There are BSDs, other places Postgres can run, probably have to narrow the scope a bit to solve something.

Tobias Bussmann in chat, responding to “Ha! Try relying o…” Sounds promising, but for which architectures? I have good hope for pljs as replacement for plv8

Ian Stanton in chat: https://github.com/tembo-io/trunk/blob/d199346/contrib/fuzzystrmatch/Trunk.toml#L13

David Wheeler (he/him): Fortunately there are only around 1100 extensions in the world, a relatively low barrier at this point. Some of these other things have thousands or millions of extensions.

Yurri: I guess when you put it that way! But I wasn’t going to go through all 1000 of them one-at-a-time.

David Wheeler (he/him): No. I posted about this on Ivory a few weeks ago [ed. correction: he means on Mastodon]: how does one do this in a platform-neutral way. There are some emerging standards where people are trying to figure this stuff out. One is called purl, where you specify dependencies by packing URLs, or “purls”, and then it’s up to the installing client to resolve them vai whatever the packaging system it depends on.

I would assume on Windows we’d have to say “it works great as long as you use Chocolatey” or something like that. But it’s certainly a difficult problem. I’m looking forward to your talk about your unique approach to solving it, Yurrii [ed. note: that’s the May 1 mini-summit], that’s going to be super interesting.

David G. Johnston: Ultimately you just crowd sourcing. If we just say “this is what we call this thing in PostgreSQL world”, then if people need to compile it on Chocolatey on Windows, they figure it out and contribute it. Or on Debian or RedHat. Just facilitate crowd-sourcing, metadata in a database.

David Wheeler (he/him): My initial idea was a global registry that people contribute to just by editing files in a GitHub repository.

David G. Johnston: HashiCorp has to have something like that already, there’s stuff out there, no need to reinvent the wheel. This is a global problem if we open-source it we can solve it.

David Wheeler (he/him): Right. Really appreciate everyone coming. Great discussion, I appreciate it. In two weeks, Devrim Gündüz is going to talk about the Yum Community Repository and the challenges of RPMifying extensions. I had this idea of automating adding extensions to the Yum and Apt repositories, an Devrim is a little skeptical. So super look forward to his perspective on this stuff. Two weeks from today at noon [ed.: America/New_York]. Thanks for coming!

Eric in chat: Thanks a ton! This is exciting stuff.

Tobias Bussmann in chat: Thanks all!

Grant Holly in chat: Thanks everyone. Great discussion

Jeremy S: in chat: Thanks david

Steven Miller in chat: Thanks all! Cya next time

Jeremy S in chat: Isn’t bagel supposed to come for the end

Ian Stanton in chat: Thanks all :)

More about… Postgres PGXN Extensions PGConf Summit trunk

Sunday, 24. March 2024

Werdmüller on Medium

How to build a culture that supports engineers

And is productive for workers in every creative discipline Continue reading on Medium »

And is productive for workers in every creative discipline

Continue reading on Medium »

Friday, 22. March 2024

Identity Woman

Real Names & Glassdoor & Fishbowl

There is a Social Contract with sites and their name policies. Some how companies still don’t really understand this (See what WOW did in 2010) . There is a massive story breaking this week about how GlassDoor a site where people who can create profiles and anonymously rate their employers over their entire careers has […] The post Real Names & Glassdoor & Fishbowl appeared first on Iden

There is a Social Contract with sites and their name policies. Some how companies still don’t really understand this (See what WOW did in 2010) . There is a massive story breaking this week about how GlassDoor a site where people who can create profiles and anonymously rate their employers over their entire careers has […]

The post Real Names & Glassdoor & Fishbowl appeared first on Identity Woman.

Thursday, 21. March 2024

Just a Theory

RFC: PGXN Metadata Sketch

Request for comments on a sketch of a new metadata standard for Postgres extension packaging, distribution, and delivery, building on the PGXN Meta Spec to address its shortcomings and emerging use cases 12 years on.

Update 2024-03-22: To compliment high-level discussions on the #extensions channel on the Postgres Slack, I also created a pull request to make it easer directly comment to this post point by point. Keep the feedback coming, and thank you!

Update 2024-03-25: Clarified the definition of “Release” and made “Maintainers” plural. Thanks to Matthias van de Meent for the suggestions!

This post proposes a new metadata standard for extension packaging, distribution, and delivery, building on the PGXN Meta Spec to address its shortcomings and emerging use cases 12 years on. The goals include:

Don’t break the existing standard, but add to it Add explicit support for different types of Postgres extensions, such as background workers and loadable modules Add additional metadata for curation and automated binary compilation Add hooks for registry and third-party-contributed metadata, including binary distribution options and stats & reports.

There are two high level categories of metadata, split into two separate files:

Package Metadata provided by extension package authors Registry Metadata aggregated by the root registry from various sources, including data derived from the extension source code or package metadata, but also trusted third parties such as packagers, smoke testers, security scanners, and more.

Following community discussion of this proposal, the Package Metadata will lead to a draft for PGXN Meta Spec version 2.0.0, while the Registry Metadata will guide the design and implementation of the Root Registry APIs required to provide it.

Terms

Definition of terms as used in this document, because every language and distribution system uses terms differently. These are ours.

Extension: a software component that extends the capabilities of a PostgreSQL database or cluster. Extensions may be CREATE EXTENSION extensions, background workers, command-line apps, loadable modules, shared libraries, and more.

Package: A collection of extensions that are released, versioned, and distributed together. Packages may be downloaded directly from version control repositories or in archive files generated by a release tag.

Package Path: Identifies a package, declared in the package metadata file. A package path should describe both what the package does and where to find it. Typically, a package path consists of a repository root path — the directory that contains the metadata file — and a directory within the repository.

Repository: The version control repository where the package is developed, referenced as package path or URL.

Repository root path: The portion of the package path that corresponds to the root directory of the version control repository where the package is developed, and which contains the metadata file. Most package are defined in their repository’s root directory, so this is usually the entire path. For example, github.com/example/pgtap is the repository root path for the package of the same name.

Source Distribution: The contents of a single package bundled together with package metadata into distributable archive file, usually named with the last part of the package path or the main extension, a dash, and the version, e.g., pgtap-1.14.3.zip.

Binary Distribution: A binary software distribution of a package, compiled and packaged for a particular OS version, platform, and hardware architecture. Examples include deb, RPM, and wheel.

Release: A single version of the package made available to the public on PGXN, expressed as the package path, an at sign, and the semver. Example: github.com/theory/pgtap@v1.14.3.

Package Metadata

The extension author provides this data as part of a release submitted to the root registry. The registry itself will not change this data in any way, but provide it as-is.

Essentials

Essential information about the package itself, including its path, name, version, list of authors, license, etc. Ecosystem applications use this data for indexing, naming conventions, and display information.

Fields (all required):

Package: A package path uniquely identifying the package (usually the source repo host name and path; see this RFC) Name: The name of the extension package, usually the same as the main extension, not unique but conflicts are resolved by escalation to package Version: The version of the package, expressed as a SemVer Abstract: A short description of the purpose of the package Description: A longer description of the package, answering the question “what is this thing and what value is it?” Maintainers: List of maintainers, each an object with name and either email or url (or both) License: An SPDX 2.3 license expression comprised of one or more licenses from the SPDX License List Spec: Information about the metadata spec itself, a version and URL

Example:

{ "package": "github.com/example/pg-pair", "name": "pair", "version": "1.1.0", "abstract": "A key/value pair data type", "description": "Adds a key/value pair data type to Postgres, useful for specifying arbitrary key/value function parameters.", "maintainers": [ { "name": "Naomi Nagata", "email": "naomi@example.com", "url": "https://example.com/roci/naomi" } ], "license": "PostgreSQL OR MIT", "spec": { "version": "2.0.0", "url": "https://pgxn.org/meta/spec.txt" } }

Other possibilities:

License File: An alternative to license for nonstandard licenses (from cargo) Contributors: List of people who have contributed to the package Emeriti: List of former maintainers and contributors

PGXN Compatibility:

Overloads the string representation of the spec v1 field, but it should be able to parse either one, especially with the spec version to differentiate. Uniqueness of package and relationship to name PGXN has maintainer; is it really worth pluralizing it? Maybe we don’t need it at all, given repository and issue links. Artifacts

A list of links and checksums for downloading the extension package in one or more formats, including source code, binaries, system packages, and more. Apps use this information (along with registry metadata) to determine the best option for installing an extension on a particular system.

This information is optional, and provided in addition to the source code available from the root registry itself. The idea is that, on release, the release manager creates artifacts in one or more locations (such as GitHub releases), and it’s useful to link to them from the registry. Binary builds may be unlikely today, but leave the option open for now by allowing arbitrary text in the type field.

Each item must have:

Type: The type of artifact: Source, binary, RPM, etc. URL: Link the the artifact SHA: A SHA checksum

Each URL must properly resolve and the checksum must match.

[ { "type": "source", "url": "https://github.com/theory/pg-pair/releases/download/v1.1.0/pair-1.1.0.zip", "sha256": "2b9d2416096d2930be51e5332b70bcd97846947777a93e4a3d65fe1b5fd7b004" }, { "type": "binary", "url": "https://github.com/theory/pg-pair/releases/download/v1.1.0/pair-1.1.0-linux-amd64.tar.gz", "sha1": "12d9bc5cfb6bc3c453627eac69511f48be63cfc0" }, { "type": "binary", "url": "https://github.com/theory/pg-pair/releases/download/v1.1.0/pair-1.1.0-linux-arm64.tar.gz", "sha1": "787dc39137f7d1510a33ab0a1b8905cd5f3f72d1" } ]

PGXN Compatibility:

New object, no conflicts.

Resources

External information about the package, mostly links, including source code repository, bug reporting, documentation, badges, funding, etc. Apps use this data for links, of course, but also full text indexing, documentation rendering, and displaying useful information about the extension.

Fields:

Homepage: URL for the package’s home page Issues: URL to issues Documentation: URL to documentation Support: URL to support Repository: Source code repository URL Badges: Array of badge link objects Funding: Array of funding objects { "homepage": "https://pair.example.com", "issues": "https://github.com/example/pair/issues", "documentation": "https://pair.example.com/docs", "support": "https://github.com/example/pair/discussions", "repository": "https://github.com/example/pair", "badges": [ { "alt": "Test Status", "src": "https://test.packages.postgresql.org/github.com/example/pair.svg" } ], "funding": [ { "type" : "individual", "url" : "http://example.com/donate" }, { "type" : "patreon", "url" : "https://www.patreon.com/example" } ] }

PGXN Compatibility:

issues replaces bugtracker and is just a URL. Can support either repository is just a URL instead of an object. Can support either but with the package key referencing the repository path perhaps we don’t need to change it. Contents

A description of what’s included in the package. Often a package consists of multiple extensions, such as PostGIS, which includes postgis, postgis_tiger_geocoder, address_standardizer, and more.

Furthermore, some extensions are not CREATE EXTENSION-type extensions at all, such as background workers, command-line apps, loadable modules, shared libraries, and more. Each should be listed along with documentation links where they differ from the package overall (or are simply more specific).

Fields:

Extensions: Object describing CREATE EXTENSION extensions. Keys are extension names, fields are objects with the fields sql, doc, abstract, control (same as control file, but can use the file itself), tle (meaning it can be used as a trusted language extension), and preload (to indicate whether its libraries need to be loaded in advance via shared_preload_libraries, session_preload_libraries, or local_preload_libraries). Workers: Object describing background workers. Keys are worker names, fields are objects with the fields bin, doc, and abstract. Apps: Object describing applications, command-line or otherwise. Keys are app names, fields are objects with the fields bin, doc, and abstract. Modules: Object describing loadable modules that can be loaded into Postgres (not necessary for extensions that include libraries). Keys are module names, fields are objects with the fields lib (without file extension), doc, abstract, and preload. Libraries: Other libraries that may ship in the package and need to be installed but are not loadable modules, such as a dynamic library used by an app. Keys are library names, fields are objects with the fields lib (without file extension) or dir (for a directory of files), doc, and abstract. { "extensions": { "pair": { "sql": "sql/pair.sql", "doc": "doc/pair.md", "abstract": "A key/value pair data type", "preload": "session", "tle": true, "control": { "directory": "pair", "module_pathname": "$libdir/pair", "relocatable": true } } }, "workers": { "pair_pruner": { "bin": "bin/pair_pruner", "doc": "doc/pair_pruner.md", "abstract": "A worker to periodically prune pairs" } }, "apps": { "pair_rand": { "bin": "bin/pair_rand", "doc": "doc/pair_rand.md", "abstract": "Command to generate random pairs of strings" } }, "modules": { "lib_pair": { "lib": "lib/lib_pair", "doc": "doc/lib_pair.md", "abstract": "A library hooking function calls to convert pairs to named parameters", "load": "shared_preload_libraries" } }, "libraries": { "ruby_pair": { "dir": "lib/gems", "abstract": "Ruby libraries required to run the extension" } } }

Other options:

Allow or require an array of docs? Specify support for specific doc formats (markdown, manpage, plain text, MkDocs, etc.)? Support glob patterns? Add libraries as sub-key for workers and apps, e.g. for a Ruby app that requires a directory of gems?

PGXN Compatibility:

Significantly different from the existing provides object, so key this one under contents or manifest and continue to support both.

Dependencies

A list of external dependencies required to configure, build, test, install, and run the extensions in the package. These include not only other extension packages, but also external libraries and system dependencies. And let’s not forget the versions of Postgres required, as well as any OS and version dependencies (e.g, does it work on Windows? FreeBSD? What versions?) and architectures (arm64, amd64, etc.)

Fields:

platforms: Platforms supported by the package, defined as OS and architecture. Assume all platforms if none are listed.

postgres: Versions of PostgreSQL required by the package, specified as a version range, as well as compiled features (--with-* configure options)

pipeline: The build pipeline required to build the extension. Options: pgxs, meson (?) pgrx, gnu (./configure), gem, cpan, pip, go, etc.

prereqs: Object defining external dependencies required for different phases:

configure: Dependencies to configure the package (e.g., items required for ./configure or make to work) build: Dependencies to build the package (in addition to run dependencies) test: Dependencies to test the package (in addition to build and run dependencies) run: Dependencies to run the package develop: Dependencies to develop the package (in addition to all other phase dependencies)

Each points to an object with at least one of these keys:

requires: Required to use the package recommends: Not required, but recommended as a best practice suggests: Not required, but nice to have conflicts: Package will not work with these items

These, in turn, contain at least one of these keys:

package: Other packages, expressed by their package paths, pointing to a version expression external: External dependencies not included with the package contrib: List of Postgres contrib or development packages

variations: A list of dependency variations, each with two fields:

where: Specifies a configuration that, when matched, includes the dependencies. For example, to specify alternate dependencies for Linux arm64, it would be "platforms": { "linux": "arm64" }. For a specific version of Postgres, it would be something like "postgres": { "version": ">= 16, < 17" }. [This likely needs some deep thinking through.] dependencies: An object containing of the other key structures in this object, defining the additional dependencies for environments that match the where field.

The packages and external keys point to objects in which the keys are dependency names and the values are version range expressions (empty string means any version). The postgres.version field is also a version range expression.

The contrib key points to an array of Postgres-supplied apps and extensions. Versions are not necessary because they effectively have the same version as a supported Postgres version.

Example:

{ "postgres": { "version": ">= 12, < 17", "with": [ "xml", "uuid", "perl" ] }, "pipeline": "pgrx", "platforms": { "linux": [ "amd64", "arm64" ], "darwin": [ "amd64", "arm64" ], "windows": [ "amd64" ], "freebsd": [ "amd64" ] }, "dependencies": { "configure": { "requires": { "external": { "cargo-pgrx": "" } } }, "build": { "requires": { "external": { "curl": "", "awk": "", "sed": "", "perl": "5.20" } }, "recommends": { "external": { "jq": "", "perl": "5.38" } } }, "test": { "requires": { "contrib": [ "pg_regress", "plpgsql", "plperl" ], "packages": { "github.com/theory/pgtap": "1.1.0" } } }, "run": { "requires": { "contrib": [ "plperl" ], "packages": { "github.com/theory/hostname": "" } } } } }

Notes:

The external field is the tricky one. How does one provide a canonical dependency for a third-party library or app that may have loads of different names on different platforms, different packaging systems (apt vs rpm), and even different versions of them, or split apart for development and runtime use? Some possibilities:

Use purl and let downstream services handle the conversion Create a rule set that allows one to predict dependency name changes across ecosystems, similar to repology-rules Just use Repology?

Other notes:

Decide on a version range standards to adopt. Start with the PGXN version range, pep-0508, npm ranges, and Gem::Version. Is pipeline really necessary, given configure requirements? I think so, because it tells the client the preferred build system to use, in case it can’t detect it for some reason.

PGXN Compatibility:

Significantly differs from the existing prereqs object, so this one is keyed under dependencies and we can support either.

Classification

Classification metadata lets the extension developer associate additional information to improve discovery, such as key words. It might also allow selections from a curated list of extension classifications, such as the category slugs supported for the cargo categories field.

Ecosystem apps use this data to organize extensions under key words or categories, making it easier for users to find extensions that are frequently used together or for specific workloads or tasks.

Example:

{ "tags": [ "testing", "pair", "parameter" ], "categories": [ "Machine Learning" ] }

PGXN Compatibility:

Conflicts with the existing root-level tags array, but we can support either one.

Example: PGML Extension

A hand-coded example that attempts to capture all the complexity of the pgml extension:

{ "package": "github.com/postgresml/postgresml/pgml-extension", "name": "pgml", "version": "2.8.2", "abstract": "pgml: Created by the PostgresML team", "maintainers": [ { "name": "the PostgresML team", "url": "https://github.com/postgresml/postgresml/" } ], "license": "MIT", "spec": { "version": "2.0.0", "url": "https://pgxn.org/meta/spec.txt" }, "artifacts": [ { "type": "source zip", "url": "https://github.com/postgresml/postgresml/archive/refs/tags/v2.8.2.zip", "sha256": "2b9d2416096d2930be51e5332b70bcd97846947777a93e4a3d65fe1b5fd7b004" }, { "type": "source tgz", "url": "https://github.com/postgresml/postgresml/archive/refs/tags/v2.8.2.tar.gz", "sha256": "845f28339c6159ac32daccea1cd17b386ea083c3e60bb8d58fb737725afe7eb5" } ], "resources": { "homepage": "https://postgresml.org/", "issues": "https://github.com/postgresml/postgresml/issues", "documentation": "https://postgresml.org/docs/", "support": "https://discord.com/invite/DmyJP3qJ7U", "repository": "https://github.com/postgresml/postgresml", "badges": [ { "alt": "Tests Passing", "src": "https://github.com/postgresml/postgresml/actions/workflows/ci.yml/badge.svg" }, { "alt": "Chat Activity", "src": "https://camo.githubusercontent.com/1988e7aaaa6a0c8ddcf880faacf1ec3263e23fac4aeff4710922082fde84442a/68747470733a2f2f696d672e736869656c64732e696f2f646973636f72642f31303133383638323433303336393330303939", "link": "https://discord.gg/DmyJP3qJ7U" } ] }, "manifest": { "extensions": { "pgml": { "sql": "sql/pgml--2.8.2.sql", "preload": "shared", "tle": false, "control": { "comment": "pgml: Created by the PostgresML team", "default_version": "2.8.2", "module_pathname": "$libdir/pgml", "relocatable": false, "superuser": true, "schema": "pgml", "trusted": true } } } }, "dependencies": { "postgres": { "version": ">= 15, < 16" }, "pipeline": "pgxs", "platforms": { "linux": [ "amd64", "arm64" ], "darwin": [ "amd64", "arm64" ] }, "dependencies": { "configure": { "requires": { "external": { "cargo-pgrx": "", "bison": "", "cmake": "", "flex": "", "libclang-dev": "", "libopenblas-dev": "", "libpython3-dev": "", "libreadline-dev": "", "libssl-dev": "", "pkg-config": "" } } }, "run": { "requires": { "external": { "libopenblas": "", "libpython3": "", "libreadline": "", "libssl": "", "python3": "" } }, "recommends": { "external": { "python(pyarrow)": "=11.0.0", "python(catboost)": "", "python(lightgbm)": "", "python(torch)": "", "python(torchaudio)": "", "python(torchvision)": "", "python(xgboost)": "", "python(accelerate)": "", "python(bitsandbytes)": "", "python(ctransformers)": "", "python(huggingface-hub)": "", "python(deepspeed)": "", "python(einops)": "", "python(optimum)": "", "python(peft)": "", "python(tokenizers)": "", "python(transformers)": "", "python(transformers-stream-generator)": "", "python(InstructorEmbedding)": "", "python(sentence-transformers)": "", "python(rouge)": "", "python(sacrebleu)": "", "python(sacremoses)": "", "python(datasets)": "", "python(orjson)": "", "python(langchain)": "" } } } }, "variations": [ { "where": { "platforms": { "linux": [] } }, "dependencies": { "prereqs": { "run": { "recommends": { "external": { "python(auto-gptq)": "", "python(xformers)": "" } } } } } } ] }, "classification": { "tags": [ "machine learning", "ml", "transformers" ], "categories": [ "Machine Learning" ] } }

Note that PostgresML also provides a dashboard app, but it does not ship with the extension, so it’s not listed here. It could have its own package metadata and be separately distributed.

For the Python dependencies, in the absence of a neutral standard for specifying package, here they’re specified using a syntax borrowed from RPM to make clear that they’re Python dependencies. A couple of those dependencies are Linux-only, so listed in variations where the OS is Linux.

Registry Metadata

The second metadata format is registry metadata. It starts by with a copy the package metadata, but builds from there, adding information (such as the artifact link, SHA, and cryptographic signature from the registry itself) and sections (defined below).

Providing a separate metadata format enables broader community collaboration to augment the metadata for extension packages.

Ideally an ecosystem of trusted services will be developed by various parties, who will report back to the root registry as the source of record for all metadata about extension packages. For example, a binary distribution service like apt.postgresql.org or trunk will submit links and checksums for binary packages and perhaps periodic download stats.

Broadly speaking, there are three typologies for registry metadata: Distributions, Reports, and Classifications.

Distributions

A list of links and checksums for downloading binary distributions for the extension package. The root registry page for the package can therefore list all the ways to install an extension and expand the list as new binary registries add it.

Along with the Artifacts package metadata, the distribution metadata allows installer tools to select the best option to install based on the OS, Postgres version, and cluster configuration (e.g., install the community Apt package if the cluster was installed from the community Apt registry).

Each item must have:

Registry: The name of the distribution registry, unique Type: The type of distribution: Apt, RPM, TLE, etc. URL: Link to the the registry’s page for the distribution Command: In lieu of a URL and SHA it can provide instructions to install the distribution, e.g., apt-get install postgresql16-pair.

Each distribution registry must be trusted by the root registry to submit this information to add to the metadata, and have its own page with additional contextual information: its home page, docs, basic instructions, whatever. A README of sorts.

Example:

[ { "registry": "trunk.pgxn.org", "type": "trunk", "command": "pgxn install github.com/example/pair" }, { "registry": "apt.postgresql.org", "type": "apt", "command": "apt-get install postgresql16-pair" }, { "registry": "yum.postgresql.org", "type": "rpm", "command": "yum install postgresql16-pair" }, { "registry": "pgt.dev", "type": "apt", "url": "https://pgt.dev/extensions/pair", "command": "trunk install pair" }, { "registry": "pgxman.com", "type": "apt", "url": "https://pgxman.com/x/pair", "command": "pgxman install pair" }, { "registry": "database.dev", "type": "sql", "url": "https://database.dev/example/pair", "command": "SELECT dbdev.install('example-pair');" }, ]

Other Options:

Information about installing from a registry that doesn’t offer URLs, like Homebrew, where one just needs to know what the formula is called. Maybe the same should apply to Apt? Metrics and Reports

This object includes data such as number of downloads, build and test status on various Postgres/OS/version combinations, binary packaging distributions, test coverage, security certification, vulnerability detection, quality metrics and user ratings, and more. Some of these bits of data might include badges, e.g., for ratings or security assessments.

In the broader ecosystem, it would be the responsibility of the root registry to ensure that data comes only from trusted sources. However this data will be publicly readable, allowing any downstream applications to extend and publish metrics and reports with their own information.

Example:

{ "aggregates": { "downloads": 32 }, "sources": { "smoke-tester.example.com": { "stats": { "passes": 32, "fails": 8, "unknown": 0 }, "link": "https://smoke-tester.example.com/extension/pair", "badge": "https://smoke-tester.example.com/extension/pair.svg" }, "pgt.dev": { "downloads": 12, "tags": ["two", "kv"], "categories": ["Analytics"] }, "pgxman.com": { "downloads": 20 }, "ratings.example.com": { "stats": { "ratings": 324, "average": 3.2, "comments": 13 }, "link": "https://ratings.example.com/extension/pair", "badge": "https://ratings.example.com/extension/pair.svg" }, "stats.example.com": { "contributors": 2, "stars": 14, "last_active": "2024-01-12", }, "security-scan.example.com": { "advisories": { "high": 2, "moderate": 11, "low": 1, "cves": 0 }, "link": "https://security-scan.example.com/extension/pair", "badge": "https://security-scan.example.com/extension/pair.svg" } } }

Notes:

The aggregates section aggregates results from multiple sources, for example summing all downloads or averaging ratings. The list of items to aggregate could evolve regularly.

Each key in sources identifies a trusted downstream source of information. Each would have its own schema describing its objects and their meaning, along with URI templates to link to. For example, stats.example.com might have these templates:

{ "contributors": { "alt": "List of people who have contributed to the {name} project", "badge": "https://stats.example.com/{ package }/contributors.svg", "uri": "https://stats.example.com/{ package }/contributors" }, "stars": { "alt": "Star ratings for the {name} project", "badge": "https://stats.example.com/{ package }/stars.svg", "uri": "https://stats.example.com/{ package }/stars" } }

Might be worth adopting more of an API of multiple files, rather than one big file. Example: Homebrew analytics. Note also its support for date ranges for analytics (30/90/365 days, stats when directly installed vs. installed as a dependency).

More about… Postgres PGXN Extensions Packaging Metadata

Thursday, 21. March 2024

Mike Jones: self-issued

Eight Specifications Published in Preparation for IETF 119

My co-authors and I published updated versions of eight specifications in preparation for IETF 119 in Brisbane. The specifications span three working groups: JOSE, COSE, and OAuth. The updated specifications and outcomes when discussed at IETF 119 are as follows. 1, 2, & 3: JSON Web Proof, JSON Proof Algorithms, and JSON Proof Token. Updates […]

My co-authors and I published updated versions of eight specifications in preparation for IETF 119 in Brisbane. The specifications span three working groups: JOSE, COSE, and OAuth. The updated specifications and outcomes when discussed at IETF 119 are as follows.

1, 2, & 3: JSON Web Proof, JSON Proof Algorithms, and JSON Proof Token. Updates were:

Normatively defined header parameters used Populated IANA Considerations sections Allowed proof representations to contain multiple base64url-encoded parts Specified representation of zero-length disclosed payloads Added Terminology sections Updated to use draft-irtf-cfrg-bbs-signatures-05 Updated to use draft-ietf-cose-bls-key-representations-04 More and better examples Improvements resulting from a full proofreading

Continued reviews and feedback from implementations are requested.

4: Fully-Specified Algorithms for JOSE and COSE. Updates were:

Published initial working group document following adoption Added text on fully-specified computations using multiple algorithms Added text on KEMs and encapsulated keys Updated instructions to the designated experts

It was agreed during the JOSE meeting to describe what fully-specified algorithms for ECDH would look like, for consideration by the working group.

5: OAuth 2.0 Protected Resource Metadata. Updates were:

Switched from concatenating .well-known to the end of the resource identifier to inserting it between the host and path components of it Have WWW-Authenticate return resource_metadata URL rather than resource identifier

It was decided to start working group last call during the OAuth meeting.

6: COSE “typ” (type) Header Parameter. Updates were:

Added language about media type parameters Addressed working group last call comments Changed requested assignment from 14 to 16 due to conflict with a new assignment Addressed GENART, OPSDIR, and SECDIR review comments

This document is scheduled for the April 4, 2024 IESG telechat.

7: Barreto-Lynn-Scott Elliptic Curve Key Representations for JOSE and COSE. Updates were:

Changed to use key type EC for JOSE and equivalent EC2 for COSE for uncompressed key representations Changed identifier spellings from “Bls” to “BLS”, since these letters are people’s initials

We received feedback to not add compressed key representations to the draft.

8: Use of Hybrid Public-Key Encryption (HPKE) with JavaScript Object Signing and Encryption (JOSE). Updates were:

Use existing "alg": "dir" value for HPKE Direct Encryption mode Aligned choices more closely with those of Use of Hybrid Public-Key Encryption (HPKE) with CBOR Object Signing and Encryption (COSE) Defined both Integrated Encryption mode and Key Encryption mode Added IANA Considerations section Removed Post-Quantum Considerations

It was decided to start a working group call for adoption during the JOSE meeting.

Thanks to all who contributed to the progress made on these specifications, both before and during IETF 119!

Wednesday, 20. March 2024

Bill Wendels Real Estate Cafe

Will NAR Settlement turn into “Concession Circus” or empower DIY homebuyers & put savings back into buyer agency?

As the first class action lawsuits began moving towards court dates last summer, began covering a tabletop with of stickies of different “what if” scenarios… The post Will NAR Settlement turn into “Concession Circus” or empower DIY homebuyers & put savings back into buyer agency? first appeared on Real Estate Cafe.

As the first class action lawsuits began moving towards court dates last summer, began covering a tabletop with of stickies of different “what if” scenarios…

The post Will NAR Settlement turn into “Concession Circus” or empower DIY homebuyers & put savings back into buyer agency? first appeared on Real Estate Cafe.

Tuesday, 19. March 2024

Doc Searls Weblog

The Online Local Chronicle

After we came to Bloomington in the summer of 2021, we rented an apartment by Prospect Hill, a quiet dome of old houses just west of downtown. There we were surprised to hear, nearly every night, as many police and ambulance sirens as we’d heard in our Manhattan apartment. Helicopters too. Soon we realized why: […]
Bloomington Hospital on October 15, 2022, right after demolition began.

After we came to Bloomington in the summer of 2021, we rented an apartment by Prospect Hill, a quiet dome of old houses just west of downtown. There we were surprised to hear, nearly every night, as many police and ambulance sirens as we’d heard in our Manhattan apartment. Helicopters too. Soon we realized why: the city’s hospital was right across 2nd Street, a couple blocks away. In 2022, the beautiful new IU Health Bloomington Hospital opened up on the far side of town, and the sounds of sirens were replaced by the sounds of heavy machinery slowly tearing the old place down.

Being a photographer and a news junkie, I thought it would be a good idea to shoot the place often, to compile a chronicle of demolition and replacement, as I had done for the transition of Hollywood Park to SoFi Stadium in Inglewood, California. But I was too busy doing other things, and all I got was that photo above, which I think Dave Askins would categorize as a small contribution to topical history.

Dave is a highly productive local journalist, and—by grace of providence for a lifelong student of journalism such as me—the deepest and most original thinker I know on the topic of what local news needs in our time—and going forward. I’ve shared some of Dave’s other ideas (and work) in the News Commons series, but this time I’m turning a whole post over to him. Dig:::::

In the same way that every little place in America used to have a printed newspaper, every little place in America could have an online local chronicle.

Broadly speaking, an online local chronicle is a collection of facts organized mostly in chronological order. The “pages” of the chronicle can be thought of as subsets of a community’s universal timeline of events. These online local chronicles could become the backbone of local news operations.

First a word about what a local chronicle is not. It is not an online encyclopedia about the little place. It’s not a comprehensive history of the place in any conventional sense. Why should it not try to be those things? Because those things are too hard to think about building from scratch. Where would you even start?

It is at least conceivable that an online local chronicle could be built from scratch because you start by adding new facts that are newsworthy today. A new fact added to the chronicle is a touchstone, about which anyone can reasonably ask: What came just before that?

A working journalist in a little place with an online local chronicle will be in a good position to do two things: (1) add new facts to the local chronicle (2) help define sets of old facts that would be useful to include in the online local chronicle.

A journalist who is reporting the news for a little place would think not just about writing a report of new facts for readers today. They would keep this question in mind: What collection of old facts, if they were included in the local chronicle, would have made this news report easier to write?

Here’s a concrete example. A recent news report written for The B Square Bulletin included a mention of a planned new jail for Monroe County. It included a final sentence meant to give readers, who might be new to that particular topic, a sense of the basic reason why anyone was thinking about building a new jail: “A consultant’s report from two and a half years ago concluded that the current jail is failing to provide constitutional levels of care.”

About that sentence, a reader left the following comment on the website: “I know it’s the last sentence in an otherwise informative article, but at the risk of nit-picking that sentence seems inadequate to the task of explaining the context of the notion of a new jail and the role of the federal court and the ACLU.”

The comment continues: “It would have been better, Dave, to link to your own excellent previous work: https://bsquarebulletin.com/2023/02/20/monroe-county-sheriff-commissioners-square-off-at-committee-meeting-aclu-lawyer-says-look-you-need-a-new-jail-everyone-knows-that/

What this reader did was to identify a set of old facts that should be a collection (a page) in Bloomington’s local chronicle.

It’s one thing to identify a need to add a specific collection of facts to the local chronicle. It’s quite another to figure out who might do that. Working journalists might have time to add a new fact or two. But to expect working journalists to add all the old sets of facts would, I think, be too tall an order.

The idea would be to recruit volunteers to do the work of adding old facts to the online local chronicle. They could be drawn from various segments of the community—including groups that have an interest in seeing the old facts about a particular topic not just preserved, but used by working journalists to help report new facts.

I think many community efforts to build a comprehensive community encyclopedia have foundered, because the motivation to make a contribution to the effort is mostly philosophical: History is generally good to preserve.

The motivation for helping to build the online local chronicle is not some general sense of good purpose. Rather it is to help working journalists provide useful facts for anyone who in the community who is trying to make a decision.

That includes elected leaders. They might want to know what the reasons were at the time for building the current jail at the spot where it is now.

Decision makers include voters, who might be trying to decide which candidate to support.

Decision makers also include rank-and-file residents—who might be trying to decide where to go out for dinner and want to know what the history of health inspections for a particular restaurant are.

For the online local chronicle I have set up for Bloomington, there are very few pages so far. They are meant to illustrate the general concept:

https://thebloomingtonchronicle.org/index.php/Main_Page

The pages are long on facts that are organized in chronological order, and short on narrative. Here are the categories that include at least one page.

Category: People
Category: Data
Category: Boards, Commissions, Councils
Category: List
Category: Topical History

==========

–Dave

By the way, if you’d like to help local journalism out, read more about what Dave is up to—and what he needs—here: https://bsquarebulletin.com/about/


Just a Theory

Extension Registry Namespacing RFC

A proposal for an additional level of name uniqueness for Postgres extension packaging and distribution, based on URIs.

A few weeks ago I brainstormed about decentralized Postgres extension publishing, inspired in part by an examination of Go decentralized publishing. It was…a lot. I’ve been deeply pondering the future of PGXN and the broader extension ecosystem, and want to start to nail down some decisions. To that end, I’d like to propose an update to extension namespacing.

Status Quo

There are currently three ways in which an extension is considered unique:

Only one extension can have a given name within a single Postgres cluster. Names are defined by the name of the control file. It is therefore not possible to have two extensions with the same name in the same Postgres cluster. PGXN follows this pattern: Only one extension can have a given name in the PGXN registry. The first person to release an extension then “owns” its name, and no one else can release an extension with the same name.1 I think dbdev follows the same pattern. Other registries like trunk and pgxman define an extension by the distribution name, at least for the purposes of selecting a binary to install. Thus when you trunk install postgis, you get all of the extensions included, as you’d expect, while trunk install address_standardizer wouldn’t work at all. In the few places that trunk supports installation by extension name, it prompts the user to use the appropriate package name if there’s a conflict. A Modest Proposal

I’d like to propose the following changes to the PGXN Meta Spec to start to move away from extension uniqueness in the broader extension ecosystem and more toward package name.

Add a new field, call it module_path, project_path, project_uri, that, if present, uniquely identifies an extension project and all of its parts. It should be to a Go-style module path (or URI) that identifies the project repository path where a META.json file lives. Retain the provides object where keys identify extensions, but those keys will no longer be globally unique to the registry. In other words, the combination of module_path and extension name uniquely identifies an extension, including an empty module_path. How it Works

Some examples. Let’s say there is an existing extension named pair, included in the distribution named pg_pair:

{ "name": "pg_pair", "version": "1.2.3", "provides": { "pair": { "file": "pair.sql", "version": "1.2.0" } } }

The extension name pair is unique, and pgxn install pair will download the pg_pair v1.2.3 bundle and compile and install pair v1.2.0.

Now someone else comes along and wants to make their own pair with this metadata:

{ "name": "my_pair", "version": "0.2.3", "provides": { "pair": { "file": "pair.sql", "version": "0.2.3" } } }

Just like today, this upload would be rejected, because there is already a registered pair extension. Under my proposal, they can disambiguate by providing a module_path:

{ "name": "my_pair", "module_path": "github/example/pair", "version": "0.2.3", "provides": { "pair": { "file": "pair.sql", "version": "0.2.3" } } }

This upload would be allowed. With these two releases, someone attempting to install pair would see something like this:

$ pgxn install pair ERROR: Duplicate extension name “pair”. Install one of these instead: * pgxn.org/dist/pair * github/example/pair

Note the the module path pgxn.org/dist/pair in the the first option. This is the default module path for distributions without a module path.2 But now the user can select the proper one to install:

$ pgxn install pgxn.org/dist/pair INFO: latest version: pgxn.org/dist/pair@1.2.3 INFO: building extension INFO: installing extension INFO: done!

Furthermore, the PGXN client will prevent the user from later installing a conflicting extension. The failure would look something like:

$ pgxn install github/example/pair INFO: latest version: pgxn.org/dist/pair@0.2.3 ERROR: Cannot install extension “pair” from pgxn.org/dist/pair: ERROR: A conflicting extension named “pair” is already installed ERROR: from pgxn.org/dist/pair Features with Benefits

I see a number of benefits to this change:

Compatibility with the v1 metadata spec, so that no data migration or distribution indexing is required. It loosens up extension namespacing (or name registration, if you prefer) while adding additional metadata to help users evaluate the quality of an extension. For example, does it come from a well-known developer? You can see it right in the module path. It creates a pattern to eventually allow auto-indexing of extensions. For example, if you run pgxn install github.com/example/pew, and PGXN doesn’t have it, it can look for a META.json file in that repository and, if it exists, and there’s a semver release tag, it could try to index it and let the user install it. There are ownership issues to be worked out, but it has possibilities. It preserves the Postgres core concept of extension identity while putting in place a well-established (by Go modules and widespread use of URIs in general) that the Postgres core could eventually adopt to allow more flexible extension namespacing. Request for Comments

What do you think? Good idea? Terrible idea? Please hit me with your thoughts on Mastodon, or via the #extensions channel on the Postgres Slack. I’d like to get this decision (and a few others, stay tuned!) nailed down soon and start development, so don’t hesitate? I need your help to prevent me from making a huge mistake.

Unless the owner would like to share ownership with someone else, in which case they can email me to request that another user be granted “co-ownership”. They can also request to transfer ownership to another user, after which the original owner will no longer be able to release the extension. ↩︎

Or, if the META.json file has a repository resource with a URL, PGXN could index it as the implied module path. Or, failing that, maybe it should fall back on the distribution name instead of a pgxn.org path, and prompt with pg_pair/pair↩︎

More about… Postgres PGXN PGXN v2 Extensions Go Namespacing Registry

@_Nat Zone

XNS 技術仕様書1.0版(2002-07-09)が出てきたので貼っておく

ファイルサーバをほっていたら、XNS Tecnical Specs version 1.0(2002年7月9日版) が出てきたのでここに貼っておきます。 XDI.orgのサイトが生きていればそこからダウンロードできればよかったのですが、サイトが放棄されてしまっているので、わたしが…

ファイルサーバをほっていたら、XNS Tecnical Specs version 1.0(2002年7月9日版) が出てきたのでここに貼っておきます。

XDI.orgのサイトが生きていればそこからダウンロードできればよかったのですが、サイトが放棄されてしまっているので、わたしが代わりに置き場所を提供します。

XNSは後にXRI/XDIになり、OpenID 2.0のコア仕様となった技術です。現在のDIDもその延長線上にあります。わりと参考になることが色々あると思います。昨今分散IDとかいう人はこれは読んでおいても良い仕様かとおもいます。

XNS_Technical_Specsダウンロード

Doc Searls Weblog

The end of what’s on, when, and where

But not of who, how, and why. Start by looking here: That’s a page of TV Guide, a required resource in every home with a TV, through most of the last half of the 20th century. Every program was on only at its scheduled times. Sources were called stations, which broadcast over the air on […]

But not of who, how, and why. Start by looking here:

That’s a page of TV Guide, a required resource in every home with a TV, through most of the last half of the 20th century.

Every program was on only at its scheduled times. Sources were called stations, which broadcast over the air on channels, which one found using a dial or a display with numbers on it. Stations at their largest were regional, meaning you could only get them if you were within reach of signals on channels.

By the time TV came along, America was already devoting its evenings to scheduled programs on AM radio, which was the only kind of radio at the time.  After TV took over, everyone sat in a room bathed in soft blue light from their TV screen. Radio was repurposed for music, especially rock & roll. My first radio was the one on the right. Being in a New York suburb, my stations were WMCA/570, WMGM/1050, WINS/1010, WABC/770, and WKBW/1520 (coming in at night from Buffalo, loud as a local). All of those stations are still on the air, but playing talk, sports and religious programs.  Many fewer people listen to AM radio, or over-the-air anything, anymore. They watch and listen to glowing rectangles that connect to the nearest router, wi-fi hot spot, or cellular data site. Antennas exist, but wavelengths are so short that the antennas fit inside the rectangles.

The question “what’s on?” is mostly gone. “Where?” is still in play, at least for TV, because nobody knows what streaming service carries what you want. All the “guides”—Apple’s, Google’s, Amazon’s, everyone’s—suck, either because they’re biased to promote their own shows or because there’s simply too much content and no system to catalog and display all of it. Search engines help, but not enough.

The age we’re closing is the one Jeff Jarvis calls The Gutenberg Parenthesis. The age we’re entering is the Age of Optionality. It began for print with blogging, for TV with VCRs and DVRs, and for radio with podcasting and streaming. John Robb calls what we have now packetized media. It is obsolescing all the media we knew, almost too well, by putting it in the hands Clay Shirky named in 2009 with Here Comes Everybody.

What we make of this new age is now up to all of us. We are who, how, and why.

 

Monday, 18. March 2024

@_Nat Zone

NHKクロニクルが便利〜過去の番組の検索

NHKの放送した番組というのは国民の財産なわけですが、過去に放映した番組がわからないと、その財産のありかもわからないことになります。そのためには、番組についてのメタデータが検索できるようになっているのが望ましいですが、その第一歩として、過去の番組表が検索できるだけでもだいぶ違いま…

NHKの放送した番組というのは国民の財産なわけですが、過去に放映した番組がわからないと、その財産のありかもわからないことになります。そのためには、番組についてのメタデータが検索できるようになっているのが望ましいですが、その第一歩として、過去の番組表が検索できるだけでもだいぶ違います。

現在のNHKのメインサイト「nhk.jp」ではサイトトップから番組表検索へのリンクがありますが、これでは直近分しか検索することができません。これに対してNHKクロニクルでは過去分を検索することができます。

【図表】NHKクロニクルでは過去の番組表を検索できる(出所)https://www.nhk.or.jp/archives/chronicle/

ためしに、サイトウ・キネンと入れて検索してみましょう。すると、どさっと大量の検索結果が逆時系列で出てきます。わたしが知りたかったのは2007年に放映されたことを覚えている番組なので、そのあたりまで時系列を遡っていきます。

【図表】14ページ目で2007年に到達

今回は右上のページ表示をクリックして徐々に遡っていきましたが、URLに「&page=14」のように記入することによってページを飛ぶこともできるようです。こうやっていって、わたしが知りたかった番組は2007年12月29日に放映されていたことがわかりました。

【図表】知りたかった番組は2007年12月29日のっぽい

といっても、上記の画像からわかるように、このページだけではどんな番組だったかわかりません。なので、「番組詳細」を順番にクリックしていってこの番組だとわかったのでした。

【図表】詳細をクリックしていって番組を発見 (出所)https://www.nhk.or.jp/archives/chronicle/detail/?crnid=A200712291800001301000

これ、実はサイトウ・キネン・フェスティバルでデュティユー自身も来日して「Le Temps l’Horloge」が初演された録画でとても貴重なものです。作曲者自身が拍手しているところが写っています。また、「瞬間の神秘」も録画されているのはなかなか珍しいのではないかと思います。

この詳細ページのURLは「https://www.nhk.or.jp/archives/chronicle/detail/?crnid=A200712291800001301000」となっています。このcrnid というのがこのコンテンツの識別子のようです。(←ちゃんとIDネタを振った!)

放送時期がよくわからなかったらどうするか:=>&検索もできる!

上記では、放送時期をわたしが覚えていたので遡っていって検索するという技がつかえました。これだと放送時期がわからないと検索がすごく大変になります。ここで「デュティユー サイトウ・キネン」のように&検索ができればぐっと対象を絞れます。なので、やってみましたらできました。半角スペースでキーワードを区切って検索すればよいようです。

【図表】&検索も可能 (出所)https://www.nhk.or.jp/archives/chronicle/search/?keyword=デュティユー%E3%80%80サイトウ・キネン

この年2回放映されていることがわかります。わたしが実際に見たのは9月8日の生中継の方ですね。実際に演奏された直後なので。番組詳細を見てみます。

【図表】こちらのほうが番組詳細が詳しい (出所)https://www.nhk.or.jp/archives/chronicle/detail/?crnid=A200709081900001301000

なんと、こちらのほうが番組詳細が詳しい!「瞬間の神秘」がそれだけでなく、正式タイトル「瞬間の神秘(24の弦楽器、ツィンバロムと打楽器のための)1」としてのっています。12月のは再放送なので内容は同じはずですが、番組情報には差があるのですね。一つ一つ見に行ったほうが良さそうです。

Google検索ではひっかからない

このように便利に検索できるNHKクロニクルですが、残念なことがひとつあります。それはGoogle検索でひっかからないということです。試しに「site:www.nhk.or.jp 瞬間の神秘(24の弦楽器、ツィンバロムと打楽器のための)」と検索してみて下さい。2024年3月18日現在、何も返ってきません。これは、robots.txt でクローリングが禁止されているからですね。

User-agent: *
Disallow: /*.cgi$
Disallow: /*.cgi?
Disallow: /*/api/
(中略)
Disallow: /chronicle/

【図表】chronicleがrobots.txtで禁止されている (出所)https://www.nhk.or.jp/robots.txt

国民の利便性という観点ではちょっと残念な気がします。ここはallowしてほしいところです。2

というわけで、NHKクロニクルが便利。でもGoogleからは検索できないので直接NHKクロニクルに行って検索しないとだめ、というお話でした。あと、こうして検索したものにどうやったらアクセスできるようになるかですね。デジタル化の進展だけでなく権利処理の話も関わってきますが、このあたり、アクセスしやすくなるとよいですね。


Just a Theory

Talk: Building Trunk

Ian Stanton will be giving a talk, “Building Trunk: A Postgres Extension Registry and CLI” on Wednesday at noon US Eastern / 16:00 UTC.

Update: 2024-03-22: Slides and video linked below.

This week, my fellow Tembonaut Ian Stanton will present the extension ecosystem mini-summit talk, “Building Trunk: A Postgres Extension Registry and CLI”. We felt it important to get some insight from a couple of the recently-developed Postgres extension registries: what problems they set out to solve, how the were built and operate, successes addressing their issues, and what issues remain, both for the projects and the ecosystem overall. Ian plans to give us the low-down on trunk.

Join us! Need other information or just want an invitation without using Eventbrite, hit me up at david@ this domain, on Mastodon, or via the #extensions channel on the Postgres Slack.

Update: 2024-03-22: Such a great discussion, was great! Missed it? I got you. Links:

Video PDF Slides Notes More about… Postgres Trunk Extensions PGConf Summit

@_Nat Zone

デジタル社会の信頼を構築する「デジタルID」トラスト、日本のデジタルIDのこれから

わたしの顧問先のPwC Japanグループ主催のDigital Identity and Privacy Tech Forum 2024【配信期間:2024年3月11日(月)~2024年4月30日(火)】に出演しています。同フォーラムは、「デジタル化に伴うプライバシー保護領域の重…

わたしの顧問先のPwC Japanグループ主催のDigital Identity and Privacy Tech Forum 2024【配信期間:2024年3月11日(月)~2024年4月30日(火)】に出演しています。同フォーラムは、「デジタル化に伴うプライバシー保護領域の重要テーマであるデジタルアイデンティティとプライバシーテックの最新動向について詳しく解説」するオンライン・オンデマンドのフォーラムで、いくつかのセッションが提供されています。

デジタル社会の信頼を構築する「デジタルID」トラスト、日本のデジタルIDのこれから デジタルアイデンティティを支える技術―ソリューション導入の失敗事例と成功事例― プライバシーガバナンス最前線―データ利活用を促進していくために、いま企業に求められていること― プライバシーテックの活用と今後の展望

わたしが出演しているのは、このセッションを企画したPwCコンサルティング合同会社 ディレクター 柴田健久率いる、【1. デジタル社会の信頼を構築する「デジタルID」トラスト、日本のデジタルIDのこれから】です。

セッションの参加者

ご一緒させていただいたのは、

デジタル庁セキュリティアーキテクト 満塩尚史 OIX チーフ・アイデンティティ・ストラジテスト ニック・マザショウ PwCコンサルティング合同会社 ディレクター 柴田健久

の3氏に、パーソナリティとしてフォーブスジャパンの藤吉氏が出演しておられます。

内容は

「デジタルアイデンティティの概念」 「グローバルレベルでの、デジタルIDトラストの標準化の動き」 「日本の政府関係者のIDエコシステムへの期待」 「日本企業の実態と今後の活用に向けた議論」 「未来への展望」

といったことについて各人の見解と意見交換を行っています

収録はLEDで壁や床が作ってあるスタジオ1でで、なかなか良い経験2をさせていただきました。最近の視力低下でカンペが全く見えない〜それどころか次のトピックも見えない状況でアドリブ力全開でやっていますが、いかがでしょうか。ぜひご覧になってご感想をお聞かせ願えればと思います。

なお、他の3セッションもタイトルからして非常に興味を惹かれるものです。まだわたしも見ていませんが、ぜひ時間を作って視聴したいと思っています。

視聴申込はこちらのページよりお願いいたします。

https://www.pwc.com/jp/ja/seminars/digital-identity-forum2024.html

Friday, 15. March 2024

Just a Theory

Mini Summit One

A rough transcript of my talk “State of the Extension Ecosystem”, along with interspersed chat comments and appearances by Bagel.

Great turnout and discussion for the first in a series of community talks and discussions on the postgres extension ecosystem leading up to the Extension Ecosystem Summit at pgconf.dev on May 28. Thank you!

The talk, “State of the Extension Ecosystem”, was followed by 15 minutes or so of super interesting discussion. Here are the relevant links:

Video Keynote PDF Slides

For posterity, I listened through my droning and tried to capture the general outline, posted here along with interspersed chat history and some relevant links. Apologies in advance for any inaccuracies or missed nuance; i’m happy to update these notes with your corrections.

And now, to the notes!

Introduction

Introduced myself, first Mini Summit, six leading up to the in-person summit on May 28 at PGConf.dev in Vancouver, Canada.

Thought I would get it things started, provide a bit of history of extensions and context for what’s next.

Presentation

Postgres has a long history of extensibility, originally using pure SQL or shared preload libraries. Used by a few early adopters, perhaps a couple dozen, including …

Explicit extension support added in Postgres 9.1 by Dimitri Fontaine, with PGXS, CREATE EXTENSION, and pg_dump & pg_restore support.

Example pair--1.0.0.sql:

-- complain if script is sourced in psql and not CREATE EXTENSION \echo Use "CREATE EXTENSION pair" to load this file. \quit CREATE TYPE pair AS ( k text, v text ); CREATE FUNCTION pair(text, text) RETURNS pair LANGUAGE SQL AS 'SELECT ROW($1, $2)::pair;'; CREATE OPERATOR ~> (LEFTARG = text, RIGHTARG = text, FUNCTION = pair);

Bagel makes an appearance.

Example pair.control:

# pair extension comment = 'A key/value pair data type' default_version = '1.0' module_pathname = '$libdir/pair' relocatable = true

Example Makefile:

EXTENSION = pair MODULEDIR = $(EXTENSION) DOCS = README.md DATA = sql/pair--1.0.sql TESTS = test/sql/base.sql REGRESS = base REGRESS_OPTS = --inputdir=test MODULES = src/pair PG_CONFIG ?= pg_config PGXS := $(shell $(PG_CONFIG) --pgxs) include $(PGXS)

Build and Install:

$ make make: Nothing to be done for `all'. $ make install mkdir -p '/pgsql/share/extension' mkdir -p '/pgsql/share/pair' mkdir -p '/pgsql/share/doc/pair' install -c -m 644 pair.control '/pgsql/share/extension/' install -c -m 644 sql/pair--1.0.sql '/pgsql/share/pair/' install -c -m 644 README.md '/pgsql/share/doc/pair/' $ make installcheck # +++ regress install-check in +++ # using postmaster on Unix socket, default port ok 1 - base 15 ms 1..1 # All 1 tests passed.

CREATE EXTENSION:

$ psql -d try -c 'CREATE EXTENSION pair' CREATE EXTENSION $ pg_dump -d try -- -- Name: pair; Type: EXTENSION; Schema: -; Owner: - -- CREATE EXTENSION IF NOT EXISTS pair WITH SCHEMA public; -- -- Name: EXTENSION pair; Type: COMMENT; Schema: -; Owner: -- COMMENT ON EXTENSION pair IS 'A key/value pair data type';

Many of us saw opportunity in this new feature.

PostgreSQL today is not merely a database, it’s an application development platform.

— Me, 2010

Proposed to build PGXN. Raised funds to build it in late 2010. Launched site April 2011; Daniele Varrazzo released CLI, and Dickson Guedes released the dev CLI.

Problems PGXN set out to solve:

Source code distribution with user registration and namespacing Discovery: Search, read docs, brows tags Installation: CLI to compile and install using PGXS or Configure

PGXN Components:

Meta Spec Manager Root Mirror API Server Site Client Utils CLI CI/CD Image

Problems out of scope for PGXN:

Binary packaging and distribution Defer to apt/yum Developer tooling (though dev utils helped) Build tooling Defer to core (PGXS)

PGXN Shortcomings:

Little development since 2012 Search limitations Docs preferred Most distributions have few docs Some issues addressed in last few weeks Source of Record Minority of available extensions on PGXN Releases uneven or neglected

In classic SDLC fashion, PGXN POC shipped as an MVP and was neglected.

— Me, Just Now

Been peripheral to Postgres extensions for the last 10-12 years, but some things have happened.

Non-Core extension counts:

PGXN: 382 joelonsql/PostgreSQL-EXTENSIONs.md: 1,186

Daniele asks about that last source, which is just a list in a gist.

Joe Nelson links to the gist in Zoom chat. It is not his list, contrary to my off-the-cuff guess

Why haven’t extensions taken off?

Lost Opportunities

No one canonical source to discover and install extensions Difficult to find and discover extensions without canonical registry Most extensions are under-documented and difficult to understand They are also hard to configure and install; most people don’t want or need a compiler The maturity of extensions can be difficult to gauge, not systematized, must each be independently researched David Christensen in Chat “attention economy/awareness, NIH, etc” Jeremy S in chat: “Maybe some people don’t know they are using extensions (I think that’s possible to some degree)” There is no comprehensive binary packaging Centralized source distribution is insufficient (even if it were complete) jubilee in chat: Trust aspect? David Johnson in chat: To seem legit you need to touch the repo at least annually to ensure it works on the newest major release. Even if you just do your compile and update the readme. I mention using pgxn-utils and GitHub workflows to ensure my extensions continue working There is insufficient developer tooling; pgxn-utils not well-maintained, don’t build on recent Rubies, but pgrx has a lot of Rust-oriented tooling Eric in chat: ❤️ jubilee in chat: 🦀 mentioned!

Filling the Gaps

dbdev: “The Database Package Manager for Trusted Language Extensions”: Includes only TLEs, no binary extensions

trunk: “A Postgres Extension Registry”: Binary distribution of curated extensions, desires to be comprehensive and cross-platform

pgxman: “npm for PostgreSQL”: Binary Apt package distribution of curated extensions packaged with, desires to be comprehensive and cross-platform

Emphases: Ease of Use. Screenshot from pgxman:

$ curl -sfL https://install.pgx.sh | sh - 👏🎉 pgxman successfully installed $ pgxman install pgvector The following Debian packages will be installed: postgresql-14-pgxman-pgvector=0.5.1 Do you want to continue? [Y/n] y pgvector has been successfully installed.
Daniele in chat: “Missing a “curl | sudo sh” there…. 👀” Greg Mullane (CrunchyData) [he/him] in chat: “Really not a fan of that “pipe curl stuff from internet into sh” system.” Jeremy S in chat: “Someone recently reprimanded me for putting curl | psql in an extension README. From a security perspective it probably sets a better example to do curl >file.sql … psql file.sql (encourage users not to run from Internet but read/review first)” * jubilee in chat: “apt/yum install is just a better UI over curl | sh :^)” Jeremy S in chat: “Yes and once you’re to that point there’s already more supply chain verification happening” Jeremy S in chat: “It’s usually just the initial bootstrap into any system, if the setup wasn’t already in your distro”

Emphases: Platform neutrality. Screenshot from trunk:

Architecture x86-64 Operating system Debian/Ubuntu

Emphases: Stats. Screenshot from dbdev:

Downloads

20 all time downloads 0 downloads in last 30 days 1 download in last 90 days 0 downloads in last 180 days

Emphases: Curation. Screenshot from trunk:

Featured 7 Analytics 13 Auditing / Logging 7 Data Change Capture 6 Connectors 27 Data / Transformations 49
Damien Clochard in chat: gtg, see you later guys !

MVPs

trunk: Manual integration, Currently Debian-only

pgxman: Form-based submission, Currently Apt-only

dbdev: TLEs only, CLI publishing

David Christensen in chat: “go has a pretty good extensions infra, imho, wrt discovery/docs, etc. also has the benefit of the package names being the URL to access it, which is a nice convention.”

New Opportunities Today

What are the community opportunities for the extension ecosystem?

Some ideas:

Improved dev tools: More than pgxn-utils and pgrx Canonical registry: All publicly-available extensions in one pac3 Easy publishing: auto-discovery or CI/CD pipeline publishing Continuous Delivery: CI/CD pipeline publishing File-free installation: TLEs Documentation: Something like Go docs or Rust docs File management: Put all the files for an extension in one directory Improved metadata Library Dependencies: utilities used by extensions Platform Dependencies: system packages Build pipelines: PGXS, pgrx, make, cpan, pypi, etc. Artifacts: Binaries build on release Classification: Curated in addition to tags Extension Types: Extensions, apps, background workers, loadable libraries Derived Services Binary Packaging: Distributed binaries for many platforms Ratings & Reviews: Stars, thumbs, comments Aggregated Stats: Repository stats, etc. Smoke Testing: Matrix of multi-platform test results Security Scanning: Reporting vulnerabilities Badging & Curation: Third-party classification, badging various statuses

Extension Ecosystem Summit

Collaborate to examine the ongoing work on PostgreSQL extension distribution, examine its challenges, identify questions, propose solutions, and agree on directions for execution.

🏔️ Your Summit Organizers

David Wheeler, Tembo, PGXN Jeremy Schneider, AWS, dsef David Christensen, Crunchy Data, pg_kaboom Keith Fiske, Crunchy Data, pg_partman Devrim Gündüz, EnterpriseDB, yum.postgresql.org

Devrim Gunduz in chat: Thanks David!

Schedule:

March 6: David Wheeler, PGXN: “State of the Extension Ecosystem” March 20: Ian Stanton, Tembo: “Building Trunk: A Postgres Extension Registry and CLI” April 3: Devrim Gündüz: “yum.postgresql.org and the challenges RPMifying extensions” April 17: Jonathan Katz: “TLE Vision and Specifics” May 1: Yurii Rashkovskii, Omnigres: “Universally buildable extensions: dev to prod” May 15: (Placeholder) David Wheeler, PGXN: “Metadata for All: Enabling discovery, packaging, and community”

Ultimately want to talk about what’s important to you, the members of the community to make extensions successful.

Discussion

Eric: I’m Eric Ridge, one of the developers behind pgrx, as you’re going through this process of building a modern extension ecosystem, let us know what we can do on the Rust side to help make your lives easier, we’re happy to help any way we can.

Steven Miller in chat:

These are some areas of interest we noticed building Tembo

Binary packaging / distribution:

Variable installation location System dependencies / uncommon system dependencies or versions Chip specific instructions (e.g. vector compiled with avx512) Extension-specific file types / extra data files (e.g. anonymizer .csv data)

Turning on extensions automatically

Different ways to enable extensions does it need load (shared_preload_libraries, session_… etc)? Does it use create extension framework? Does it require a specific schema? What about turning on in multiple DBs at the same time in the same cluster, with background worker? Disabling, what data will be lost? Validating safety / user feedback on upgrade?

In cloud / SaaS:

Installing + enabling extensions quickly, without restart Persisting extension files Extension-specific files (e.g. libraries) versus postgres’ libraries Updating Troubleshooting crashes / core dumps

Anyone else have similar problems / tips?

Steven Miller: These were just things I noted during the presentation. Curious if these are interesting to others on the call.

Daniele in chat: “Regards binary distributions, python wheels might be a useful reference.”

Steven Miller: That’s good point! What do people think of idea to just install extensions onto servers, not packages, persisted on the disk, next to PGDATA so they go into a single persistent volume, and the rest is managed by an immutable container.

Daniele: Had experience on Crunchy where we had to replace an image to get an extension. Looked for feature to have a sidecar or a volume with the extension.

Steven Miller: Didn’t have a separate directory just for extensions, it’s just pg_config --libdir fore everything. Had to persist entire directory, including those files form the base build, their internal files. Would have been nice to have a separate directory, extra-libdr or extra-sharedir, something like that.

Yurii Rashkovskii: I was working on a patch to do exactly that, but haven’t completed it. Was going to introduce additional directories to search for this stuff.

Steven Miller: That would be really awesome.

Jeremy S in chat: “Advantage of that is that a single image can be shared among systems with different needs”

Eric in chat: “Thoughts around “enterprise repositories” that could be self-hosted and disconnected from the internet?”

Ian Stanton in chat: “I’ll touch on this in the next talk, it’s crossed our minds when building the Trunk registry”

Steven Miller: I think that’s a great idea.

Bagel reappears.

David Wheeler: PGXN originally designed so anyone could run Manager and their own root mirror, and maybe rsync from the community one. Don’t know that anyone ever did, it’s a little complicated and most people don’t want to work with Perl. [Chuckles]. Definitely think there’s space for that. If you work with Java or Go or maybe Rust, lots of orgs like Artifactory that provide internal registries. Could be cool use case for Postgres extensions.

David Christensen in chat: “something that could support extension batches; like groups of related extensions that could be installed in bulk or loaded in bulk (so could accommodate the shared systems with different individual extension needs, but could be shared_preload_library configured)”

“Steven Miller” in chat: “Sounds familiar”

Greg Mullane (CrunchyData) [he/him] in chat: “All these items remind me of CPAN. We should see what things it (and other similar systems) get right and wrong. I’ve learned from CPAN that ratings, reviews, and badging are going to be very difficult.”

David Christensen in chat: “I’d assumed at the time that it was largely lifted (at least in philosophy/design) from CPAN. 🙂” David Wheeler (he/him) in chat: “yes”

Jeremy S: I think this is mostly focused on developers, but I had recent experience where multiple people in the past few months, new to Postgres, are trying to understand extensions. They install a version and then see there are like 15 versions installed, so confused. Goes back to the install file. Bit of UX angle where there are sharp edges where people trying to make sense of extensions, the flexibility makes it hard to understand. Some might be some nice guides, some architectural things explaining PGXS, or improvements to make to the design. Related, not main topic, but good to keep end user UX and devs building on Postgres but not Postgres developers, who run their businesses.

David Wheeler: Yeah all the files can be confusing, which is why I think trunk and pgxman trying to simplify: Just run this command and then you have it.

Steven Miller in chat: “I really agree with what Jeremy is saying. Right now PGXN and Trunk are taking and approach like “whatever an extension could do / how it may work, it should work on this registry”. But I think more standards / “what is a normal extension” would make the UX much easier.”

Jeremy S: Even with that the available extensions view is still there. Some of that is just schema management, and that’s how core is doing schema management.

Steven Miller in chat: I exactly agree about the concern about multiple extensions within a package. Also version and name mismatches

David Wheeler: And not everything is an extension, you just want to work, or your extension is just utility like pg_top you just want to use. Extensions I think were a tremendous contribution to Postgres itself, but a lot of it was wrangling the existing system for building Postgres itself to make it work for that. Could be very interesting, though quite long term — and I know Dimitri has tried this multiple times — to build a proper package management system within Postgres itself, to eas a lot of that pain and burden.

Tobias Bussmann in chat: “Thank you for the great overview and for taking this topic further! Unfortunately, I’ll not be able to join at pgConf.dev but will follow whatever results this will lead to. As a package maintainer, I am constantly looking in a easy way to allow users to use extensions without having to package everything ;)”

Steven Miller in chat: “Like auto explain for example right. i.e. a LOAD only “extension””

*Yurii Rashkovskii: An interesting topic, what extensions are capable of doing and how they can be more self-contained. Like Steven was saying in chat: how easy is it to load and unload extensions. Example: want an extension to hook into a part of Postgres: executor, planner, etc. How do you go about enabling them? How you unload them, introspect, list of hooks.

Omni extension provides a list of all hooks, and when you remove an extension it removes the hooks that provide the extension, but still not part of the core. Hooks one of the greatest ways to expand the functionality of Postgres, allows us to experiment with Postgres before committing to the full cycle of getting a patch into Postgres. Lets us get it to users today to try. if it makes a lot of sense and people want it, time to commit to the process of getting a patch committed. But if we don’t hve this venue, how to get extensions in, our ability to try things is limited.

jubilee in chat: Hmm. It seems my audio is not working.

David Wheeler: The next session is two weeks from today: Ian Stanton is going to talk about “Building Trunk: A Postgres Extension Registry and CLI”. Will be interesting because a number of people have decided to build a binary packaging system for extensions, just to air out what the challenges were, what problems they wanted to solve, what problems remain, and where they want to take it in the future.

Jeremy S* in chat: “Bagel clearly has something to say”

*David Wheeler: jubileee I see your audio issues, do you just want to type your question into chat? We can also discuss things in the #extensions channel on the Postgres Slack

David Wheeler: Thank you all for coming!

jubilee in chat: “I had a question which is about: Does Postgres actually support docs for extensions? Like, a lot of people don’t really WANT to read a README. Can you get docstrings for a function in psql?”

Ian Stanton in chat: “Thank you David!”

jubilee in chat: And if not, why not?

Post Presentation Discussion

From Slack:

David Wheeler: I see now that “jubilee” left their question in the Zoom chat. David Wheeler: The closest attempt at this I’ve seen is pg_readme, which will generate Markdown for an extension from comments in the catalog and write it out to a file. David G. Johnson: The comment on command adds in database comments that psql describe commands should display.

Also on Slack, Greg Sabino Mullane started a longish thread on the things we want to do and build.

More about… Postgres PGXN Extensions PGConf Summit Bagel

Thursday, 14. March 2024

@_Nat Zone

英財務省がマネー・ロンダリング規制(MLR)の実効性向上に関するパブコメを開始

ロンドン時間3月11日、英財務省は、マネーロンダリング規制 (マネロン規制, MLRs) の有効性を向上させるためのパブコメを開始しました。このパブコメには、顧客を確認する際にデジタルアイデンティティの使用を最適にサポートする方法についての検討が含まれます。 マネロン規制は、金融…

ロンドン時間3月11日、英財務省は、マネーロンダリング規制 (マネロン規制, MLRs1) の有効性を向上させるためのパブコメ2を開始しました。このパブコメには、顧客を確認する際にデジタルアイデンティティの使用を最適にサポートする方法についての検討が含まれます。

マネロン規制は、金融サービス、不動産取引、ギャンブル、美術品の販売など、デジタルアイデンティティサービスを採用する可能性のあるさまざまな規制対象セクターをカバーしています。デジタルアイデンティティサービスは、マネーロンダリング防止チェックなどのマネロン規制要件を満たすために使用できますが、(英国において)認定されたデジタルアイデンティティサービスを規制要件を満たすためにどのように使用できるかについての具体的なガイダンスはありません。

昨年、英国科学・イノベーション・技術省3が開催したデジタルアイデンティティと属性の普及スプリント4では、英国のデジタルアイデンティティと属性信頼フレームワークとマネロン規制との関係が曖昧であることが、導入の障壁になっているという声が聞かれました。この関係を明確にすることで、重要なたくさんのユースケースにおいてデジタルアイデンティティの効率性と安全性の利点を活用し、経済犯罪との闘いを支援することができるでしょう。

信頼されたデジタルアイデンティティサービスの普及を可能にするための活動に携わっておられる各人・各団体におかれては、このパブコメに関心をもたれる向きも多いでしょう。このパブコメに回答をご希望の場合は、こちらから行うことができます。

パブコメは12週間で、2024年6月9日に締め切られます。OpenID Foundationではパブコメ提出のための準備に取り掛かっています。


Doc Searls Weblog

Happy Birthday, Mom

Mom would have been 111 today. She passed in ’03 at 90, but that’s not what matters. What matters is that she was a completely wonderful human being: as good a mother, sister, daughter, cousin, friend, and teacher as you’ll find. There is a thread in Facebook (which seems to be down now) on the […]
Eleanor Oman, Alaska, circa 1942

Mom would have been 111 today. She passed in ’03 at 90, but that’s not what matters.

What matters is that she was a completely wonderful human being: as good a mother, sister, daughter, cousin, friend, and teacher as you’ll find.

There is a thread in Facebook (which seems to be down now) on the subject of Mom as a third and fourth grade teacher in the Maywood New Jersey public school system. Students who inhabited her classes more than half a century ago remember her warmly and sing her praises. Maybe I’ll quote some of those later. (I’m between flights right now, but getting ready to board.)

Once, sitting around a fire in the commune-like place where I lived north of Chapel Hill in the mid-’70s, discussion came around to “Who is the sanest person you know?” I said “My mother.” Others were shocked, I suppose because they had issues with their moms. But I didn’t. Couldn’t. She was too wise, good, and loving. (Also too smart, quick-witted, tough, and unswayed by bullshit.)

Perhaps I’m idealizing too much. She had flaws, I’m sure. But not today.

Tuesday, 12. March 2024

Timothy Ruff

The American General Data Protection Regulations (AGDPR)

AGDPR doesn’t exist, but it should. In 2012 the Europeans released the first proposal for their General Data Protection Regulations (GDPR), which ran a four-year political gauntlet and evolved into its final, official version in 2016. Because of the ghastly surveillance situation we now find ourselves in America and elsewhere, perpetrated by both private industry and government, I think Amer
AGDPR doesn’t exist, but it should.

In 2012 the Europeans released the first proposal for their General Data Protection Regulations (GDPR), which ran a four-year political gauntlet and evolved into its final, official version in 2016. Because of the ghastly surveillance situation we now find ourselves in America and elsewhere, perpetrated by both private industry and government, I think America also desperately needs comprehensive privacy rules, rules informed by now eight years of hindsight and infused with an American value not found in Europe’s GDPR: privacy from government.

If I could wave a magic wand, an American GDPR would improve on Europe’s GDPR in two fundamental ways:

1. Eliminate the idea that data about people can be “anonymized” or “de-identified”.

Data about people can no longer be effectively de-identified by removing personally identifying information (PII) from the dataset. Maybe this was generally still possible in 2016, but not any more.

GDPR set off a wave of innovation among data scientists who, like the old game “Name That Tune”, took the new regulation as a challenge and developed numerous ways to re-identify people using only a few data points that aren’t considered PII, techniques they’ve since made widely available. And that was before the recent explosion of AI tools which make re-identification easy without the direct use of such tools. Unfortunately, fully GDPR-compliant “de-identified” data sets are now trivially easy to re-identify for anyone with access to a computer.

(For more details on this topic, see this Harvard paper about “simple” attacks that defeat GDPR’s PSO Secure / “singling out” privacy protection provisions.)

If almost any data about a person is usable to re-identify them, then the whole concept PII is no longer relevant; any data about a person is PII. That is precisely how an AGDPR definition of “personal data” can improve upon the European one: any data about an person. It’s both simpler and more accurate.

2. Include protections for privacy *from government*, in addition to privacy from the private sector.

Europe’s GDPR specifically exempts government, AGDPR should do the opposite: it should bind government’s activities in relation to obtaining, using and retaining personal data.

There are obviously necessary and important reasons for government to obtain, use, and retain personal data, and those reasons should be explicitly enumerated and protected. But the U.S. federal government has gone absolutely bananas when it comes to how much of our personal data they are obtaining about us and our activities, through legal but largely unchallenged means. The degree to which it’s happening is way beyond what’s reasonably needed in a free country that espouses limited government, and that has a Fourth Amendment that secures:

“the right of the people to be secure in their persons, houses, papers, and effects, against unreasonable searches and seizures”.

How is data about us not our “effects”? How is government gathering our effects en masse without our explicit consent not “unreasonable”?

Journalist Byron Tau’s new book about U.S. government surveillance of law-abiding citizens, Means of Control, is outrageous. (Summary article here.) You cannot read it and not become deeply alarmed at the enormous, secret, ubiquitous surveillance state that already exists in the U.S. No doubt European governments are doing the same with their citizens’ data, but — and I’m generalizing here — they don’t have the same foundational distrust of government that the U.S. was born with and so may not be bothered enough to do anything about it, as evidenced by government’s total exemption from their GDPR. I hope, and believe, that once more Americans become aware of the extent of what’s happening, an American GDPR that clearly defines government’s role in privacy can become a real, politically viable possibility.

There is a model to follow. Christopher Bramwell, the Chief Privacy Officer of Utah, was the driving force behind a groundbreaking new piece of privacy legislation just enacted in Utah that constrains how personal data must be treated by government, without saying anything about private industry. As far as I know it’s the first of its kind, anywhere, and hopefully the first of many to follow in other states and countries. This bill gained a fortunate political boon as it advanced: it became a ‘vote against privacy’ to not support it; it passed unanimously.

Of course, privacy from big tech, surveillance capitalism, and private industry in general is a big deal, but private industry doesn’t have a monopoly on violence and taxes like government does; they mostly just want to sell me stuff. Still, the European GDPR shows an example of how to constrain private industry, and an American GDPR should learn from and incorporate those lessons.

The ideal outcome of an AGDPR would provide privacy both from private industry and from government, and finally put a halt to all federal activities of indiscriminate obtaining, using, and retaining of our personal digital “effects”, whether or not they’re considered PII.

Monday, 11. March 2024

Phil Windleys Technometria

Decentralizing Energy

The end of the Age of Oil

The Accidental Superpower discusses a post-Bretton Woods world where easy, safe shipping cannot be assumed. I think in this "scared new world," renewables change the dynamics of energy flows by moving production closer to energy use.

My wife, Lynne, recently gave me a copy of Peter Zeihan's book, The Accidental Superpower: Ten Years On. The book was originally published in 2014, but Zeihan has updated it by inserting chapters talking about what he got right in 2014, what he got wrong, and why. The focus of the book is geopolitics—how geography and demographics shapes the world order—and how Bretton Woods changed that in significant ways. The book makes the case that so much of what made Bretton Woods useful to the US and why the US engaged with the rest of the world for the 70 years following World War II is changing. As it changes the free trade system enabled by Bretton Woods is also changing. This will have significant impact on every country in the world.

Much of what changes has to do with energy. One of the things1 Zeihan got right was his assertion that unlike much of the rest of the developed world, the US doesn't need to import energy—specifically oil—we are a net energy importer. This changes the dynamic wherein the US is willing to be the protector of shipping lanes for the entire world. As a result, the future could see a US that has the luxury of ignoring events in the Middle East, Ukraine, and elsewhere, whereas Europe (to take just one example) cannot. The book is full of other interesting predictions and conclusions just like this one. I encourage you to read it if you find this as fascinating as I do.

Zeihan makes a big deal of shale oil production, which accounted for 66% of US production in 2022. But as I read this, I was thinking about renewables. As I wrote in 2020, I've gone in big on solar power at my house, love my EV, and have replaced most things in the house (like the furnaces) with versions that run on electricity.  I did this because it made my life easier and saves me money. The fact that it's good for the environment is a bonus.

But, solar and wind are not just renewable, they also allow energy production to be decentralized in ways oil and natural gas can't. Oil and natural gas deposits are where they are. Some countries are blessed with them and the others have to buy from those countries. And they're often far away, requiring shipping through potentially hostile waters. But that's not true of renewables. They can usually be built and located where ever the need is2. This changes geopolitical equation in significant ways, but areas of the world that are not energy independent, like Europe, are moving toward renewables too slowly to prevent future energy shocks. The problem with renewables is that they're long-lead items—they take years to plan and bring online.

Petroleum and Bretton Woods enabled the modern world, providing portable, storable sources of energy that could easily and safely move to where ever it was needed.3 If we are indeed at the end of the Bretton Woods era, the world is in for significant changes as it adjusts to a life where free trade, and easy access to petroleum-based energy, cannot be assumed. Moving energy production closer to the places it's used is one strategy for dealing with this world-altering disruption. Buckle up.

Notes

There are other things that are important to the books overall conclusion besides energy. I'm just cherry picking that because I was thinking about it. For example, the US is largely self-sufficient from an overall import/export standpoint. We don't import nearly as much as many other countries and could replace what we do import relatively easily.

It's not just renewables. Nuclear power can also be located closer to demand than an oil deposit. I started my career as a nuclear metallurgist, so I'm a fan. I think many countries are going to be sorry they've closed nuclear plants and made them too hard to construct profitably.

The feats of engineering that have enabled these energy flows is truly astounding.

Photo Credit: Oil Tanker at Sunset from Terski (Pixabay)


@_Nat Zone

EUデジタルアイデンティティウォレットのリファレンス実装が公開されました

3月8日にX(旧Twitter)でお知らせしましたが、EU Digital Identity Wallet の Reference Implementation が公開されました。 GitHubのアドレスは、こちらです。曰く: 上記のページを一つづつたどって行くのはけっこう大変だ…

3月8日にX(旧Twitter)でお知らせしましたが、EU Digital Identity Wallet の Reference Implementation が公開されました。

EUDI Wallet Reference Implementation が出たようです。自分でコンパイルしないと使ってみることできなさそうですが。Open Wallet FoundationのメーリスではUX悪すぎと話題w https://t.co/KsFQPsUQ3y

— Nat Sakimura/崎村夏彦 (@_nat) March 8, 2024

GitHubのアドレスは、こちらです。曰く:

EUDIウォレットリファレンス実装は、アーキテクチャリファレンスフレームワーク(ARF)に基づいて構築され、欧州連合全体の共通規格に基づくデジタル識別、認証、電子署名のための堅牢で相互運用可能なプラットフォームを紹介することを目的としています。

EUDIウォレット・リファレンス実装は、ビジネスにとらわれない再利用可能なコンポーネントで構成されるモジュラー・アーキテクチャーに基づいており、段階的に進化させ、複数のプロジェクトで再利用することができる。具体的には、EUDI Wallet Reference Implementationの一部として、以下のコンポーネントが提供されます:

フレームワークに必要とされるライブラリとその他のソフトウェアコンポーネント 発行, 近接およびリモートフローのEUDI Walletモバイルネイティブデモアプリ Verifier のモバイルネイティブアプリおよびWebサービス(近接およびリモートフロー用) Issuer アプリとWebサービス (出所) Kanellopoulos, V., (2024), EUDI Wallet Reference Implementation, https://github.com/eu-digital-identity-wallet/.github/blob/main/profile/reference-implementation.md (2024-03-11取得)

上記のページを一つづつたどって行くのはけっこう大変だなと思っていたら、IdM実験室の富士榮さんがモジュール一覧をまとめてくださっていました。

公開されているのは、以下のモジュール群です。

Wallet Core(Android) and Wallet Kit(iOS) Coordinator Libraries Wallet Core(Android) Wallet Kit(iOS) Proximity Sharing iOS Libraries mDoc Security(iOS) mDoc Data Transfer(iOS) mDoc Data Model(iOS) Proximity Sharing Android Libraries mDoc Data Transfer(Android) Remote Presentation iOS Libraries Presentation Exchange(iOS) SIOPv2 and OpenID4VP protocols(iOS) SD-JWT(iOS) Remote Presentation Android Libraries Presentation Exchange(Android) SIOPv2 and OpenID4VP protocols(Android) SD-JWT(Android) Issuing iOS Libraries OpenId4VCI(iOS) Issuing Android Libraries OpenId4VCI(Android) Wallet Data Storage and Cryptographic Management iOS Libraries mDoc Document Storage (iOS) Wallet Data Storage and Cryptographic Management Android Libraries mDoc Document Storage (Android) Wallet UI App and demo App for Android and iOS UI / Demo App (Android) UI / Demo App (iOS) Verifier Apps and Services Web Verifier Restful API (web-services) Issuing Apps and Services OpenId4VCI issuer (Python) OpenId4VCI issuer (Kotlin) (出所)富士榮, (2024), 「EUのDigital Identity Walletのリファレンス実装が公開されています」, https://idmlab.eidentity.jp/2024/03/eudigital-identity-wallet.html
デモビデオ

基本、これらを使って各実装者がユーザインターフェース(UI)など含めて構築するのが前提ですが、リファレンス実装としてのUIも提供されており、このページからダウンロードしてインストールして試すこともできます。ただ、OSのセキュリティを弱めたりしなければならないのでハードルが高いですね。そのような方のために、動作を画面収録したビデオも提供されています(Android版のみ)。以下では、そのビデオの内容を説明します。(以下、スクショの出所はすべて「https://github.com/eu-digital-identity-wallet/eudi-app-android-wallet-ui?tab=readme-ov-file」です)

発行

発行のデモでは国民身分証明書(National Identity Document, National ID)の発行と運転免許証の発行の模様を見せてくれます。

デモ画面では、まずアプリの立ち上げスプラッシュスクリーンと思しきEU旗の画面が出、暫く待つと、ウォレットへのログイン画面と思しき6桁PINの入力画面になります。

これでユーザ認証がされると、アイデンティティ文書(Identity Document, ID)の取得画面になります。ここでは、国民身分証明書の取得ができるようになっています。運転免許証はこの段階ではできないようです。

ここで、National IDを選択すると画面が遷移して国の選択画面になります。デモではEUを選択していますがこれはあくまでデモ用途で、実際の場合には個別の国を選択するのだと思います。

「Submit」(提出)ボタンを押すと、再び画面が遷移し、当該プロバイダーへの個人情報の入力画面になります。多分ここで当該個人を識別するということなのでしょう。氏名と生年月日の入力を求められます。実際の発行にあたってはここで何らかの身元確認・当人認証が行われると思われます。

入力して送信すると、国民身分証明書が表示されます。

ここで「CONTINUE」(つづける)を押すと、この身分証明書(ID)がウォレットに格納されます。

ここで「ADD DOC」(文書を追加)を押すと、今度は運転免許証(Driving License)を足すことができるようになっています。

早速追加してみます。すると、今度は運転免許証発行者に画面が遷移します。先ほどと同じようにして個人情報を入力するとモバイル運転免許証が発行されてウォレットへ格納されます。

デモではこの後、証明書の削除もやってみせてくれます。

提示

提示においては、OpenID for Verifiable Presentation (OID4VP)+CBORでの選択提示、OID4VP+CBOR, OID4VP Customの3種類の提示フローがサポートされているようです。

フローを選ぶと、まずVerifier画面になって、どの属性をユーザに要求するかを決めることができます。

そして、NEXTをクリックすると、ぱっとユーザの画面に切り替わり、認可を求めるための画面に切り替わります。多分、この画面以降をユーザが操作するという前提だと思いますが、この辺、画面を見ているだけだと大変わかりにくい。

ここで認可する(AUTHORIZE)ボタンを押すと、画面が切り替わり6桁のPINを入れる「ログイン画面」になります。これは、ユーザのスマホの上にあるWalletにログインするということだと思います。

ログインすると一瞬Welcomeback Jan と表示され、

Verifierが次のデータを求めていますと表示されます。ここで、「なぜあなたのデータを求めるのか 8Why we need your data?)」という多分リンクである太文字部分があるので、これをクリックすると理由も見ることができるのだと思います。

ここで少し待つと、より詳細に、誰が求めてきているのか、どのデータを求めてきているのかが出てきます。

Verification Dataのセクションを開くと、これらの属性データを証明するための付随情報として何が送られるのかが出てきます。これらはデータに対するメタデータなわけですが、この情報から国籍だとかどこに住んでいるのかなどがわかってしまう可能性があります。そのことを考えると、ユーザにはもう少しわかりやすく出してあげることが求められるのではないかと思います。

誰が求めてきているかということに関しては、信頼できる受取者(Relying Party, RP)からの要求であるかどうかということを確認することもできるようになっているようです。

OKを押してTrusted relying partyの画面を閉じて前の画面に戻り、そこで「SHARE」を押すともう一度PINを求められ、情報が提供されます。

近接提示の場合

近接提示とは、NFCでタップしたりQRコードを読み取ったりして、ネットワークを介さずにローカルでデータを引き渡すユースケースです。

デモでは左側がVerifier画面、右側が提示するユーザ/Holder/Presenter画面になります。

まず、ユーザがウォレットにPINを入れてログインします。

ログインが成功すると、文書選択画面になります。

ここで、「SHOW QR or TAP」ボタンを押すとQRコードが表示されます。

次は、Verifier側での作業です。「Scan QR code」ボタンを押し、カメラを起動し、QRコードを読みます。

ここから先の流れは先程までのリモート提示の場合とほぼ同様です。

すべてうまく行くと、提供された属性が左側のVerifier画面に表示されます。

注意点〜ARFの状況について

EUデジタルアイデンティティウォレットのリファレンス実装を紹介しましたが、注意点が一つあります。リファレンス実装のページに「EUDIウォレットリファレンス実装は、アーキテクチャリファレンスフレームワーク(ARF)に基づいて構築され、欧州連合全体の共通規格に基づくデジタル識別、認証、電子署名のための堅牢で相互運用可能なプラットフォームを紹介する」とあるわけですが、このARFがまだ決まってないんですね。リンク先には3月7日段階のものが ver.1.3としてアップロードされていますが、広くコメントを募るために公開されたもので、eIDAS expert group の承認が得られなかったバージョンとのことです。Ver.1.2は3月1日に公開されており、かなりペースが早く変更が加えられていることがわかります。ですので、まだこれから変更がある可能性は見越しておいた方がよさそうです。

Sunday, 10. March 2024

Werdmüller on Medium

The internet, addiction, and me

Sometimes it’s not okay to look down from the world. Continue reading on Medium »

Sometimes it’s not okay to look down from the world.

Continue reading on Medium »

Saturday, 09. March 2024

Bill Wendels Real Estate Cafe

#Back2REFuture: Move past gated listing silos to money-saving, AI-enabled real estate ecosystem

In 1996, Bill Gates published The Road Ahead & predicted the internet would “eliminate the role of the real estate agent as we know it”. … The post #Back2REFuture: Move past gated listing silos to money-saving, AI-enabled real estate ecosystem first appeared on Real Estate Cafe.

In 1996, Bill Gates published The Road Ahead & predicted the internet would “eliminate the role of the real estate agent as we know it”. …

The post #Back2REFuture: Move past gated listing silos to money-saving, AI-enabled real estate ecosystem first appeared on Real Estate Cafe.

Friday, 08. March 2024

Werdmüller on Medium

Startup pitch: Social VIP

A way for organizations to more easily join the fediverse Continue reading on Medium »

A way for organizations to more easily join the fediverse

Continue reading on Medium »


@_Nat Zone

X (旧Twitter) タイムラインから「いいね」や「リポスト」の回数表示を削除へ

X (旧Twitter) がタイムラインから「いいね」や「リポスト」の回数表示を削除するようです。3月8日現在、タイムライン上のツイートには、返信数、リポスト数、いいね数、表示回数が出ていますが、これが表示回数だけになるようです。 Dodge Designer さんの投稿にElo…

X (旧Twitter) がタイムラインから「いいね」や「リポスト」の回数表示を削除するようです。3月8日現在、タイムライン上のツイートには、返信数、リポスト数、いいね数、表示回数が出ていますが、これが表示回数だけになるようです。

Dodge Designer さんの投稿にElon Musk氏が「これは確実にそうなる」と返事をしたことで明らかになりました。

That is definitely happening. Just view count, as proxy for the other metrics, will show on the timeline

— Elon Musk (@elonmusk) March 7, 2024

ただし、これらは個別のポストを開くことで見ることができるとのこと。イーロン・マスク氏曰く「他の指標の代替物としての表示数だけがタイムラインには出るようになる。」

タイムライン上に出すのは負荷も高いでしょうから分からなくもないですが、不満の声もあるようです。というか、リプライは「Horrible Idea」「terrible idea」などで埋まっています…。わたしもちょっと残念と思います。これで更にユーザ離れが進まないと良いのですが…

Monday, 04. March 2024

Just a Theory

Talk: State of the Extension Ecosystem

A quick reminder that I’ll be giving a brief talk on the “State of the Extension Ecosystem” on Wednesday at noon US Eastern / 17:00 UTC.

Update: 2024-03-06: Slides and video linked below.

A quick reminder that I’ll be giving a brief talk on the “State of the Extension Ecosystem” on Wednesday at noon US Eastern / 17:00 UTC. This talk is the first in a series of community talks and discussions on the postgres extension ecosystem leading up to the Extension Ecosystem Summit at pgconf.dev on May 28.

I plan to give a brief history of Postgres extension tools and distribution, the challenges encountered, recent developments, and opportunities for the future. It should take about 30 minutes, followed by discussion. Following this pattern for all the talks in the series, I hope to set up some engaging discussions and to surface significant topics ahead of the summit.

Join us! Need other information or just want an invitation without using Eventbrite, hit me up at david@ this domain, on Mastodon, or via the #extensions channel on the Postgres Slack.

Update: 2024-03-06: Great turnout and discussion, thank you! Links:

Video Keynote PDF Slides Notes More about… Postgres PGXN Extensions PGConf Summit

Monday, 04. March 2024

Damien Bod

Multi client blob storage access using ASP.NET Core with Entra ID authentication and RBAC

This article shows how to onboard different clients or organizations in an ASP.NET Core application to use separated Azure blob containers with controlled access using security groups and RBAC applied roles. Each user in a client group can only access a single blob storage and has no access to blob containers belonging to different clients. […]

This article shows how to onboard different clients or organizations in an ASP.NET Core application to use separated Azure blob containers with controlled access using security groups and RBAC applied roles. Each user in a client group can only access a single blob storage and has no access to blob containers belonging to different clients. Microsoft Entra ID is used to implement the blob storage access.

Code: https://github.com/damienbod/AspNetCoreEntraIdBlobStorage

Blogs in this series Using Blob storage from ASP.NET Core with Entra ID authentication Delegated read and application write access to blob storage using ASP.NET Core with Entra ID authentication Multi client blob storage access using ASP.NET Core with Entra ID authentication and RBAC

Security context diagram

The web application uses different Enterprise applications to access the different Azure APIs. One Enterprise application is used to implement the blob application contributor access which can only be used from the application. This is used when a user in the application needs to write a file to the blob through the application. A second Enterprise application is used to create the RBAC access for the blob container reader access to the files. This is used when creating a new client or new organization. A third Enterprise application is used to implement the web application OpenID Connect web client. This is created using an Azure app registration and only allows delegated permissions. The user App roles are defined in this application. The Microsoft Graph APIs can be implemented using delegated permissions or application permissions. If using Graph application permissions to create or remove the groups, a separate Enterprise application is used to create the groups. You can also used the app service managed identity and use the service principal instead of the 3 enterprise applications to assign the required permissions.

Multi client blob storage setup

The Blob account uses Microsoft Entra ID to access the blob containers. The application can write to all containers and a security group is given RBAC blob container reader access, one per security group. Users are added to the security groups per client or per organization. The setup for the client blob container is implemented in three steps:

Create a Microsoft Entra ID security group Create an Azure storage blob container Create an RBAC to give the security group Blob storage reader permissions

Create a Microsoft Entra ID security group

The CreateSecurityGroupAsync method creates a new security group in Microsoft Entra ID. This is created using Microsoft Graph and returns the group with the group ID. The service uses application permissions and is implemented in a separate Enterprise application.

using System.Text; using Microsoft.Graph.Models; namespace MultiClientBlobStorage.Providers.GroupUserServices; public class ApplicationMsGraphService { private readonly GraphApplicationClientService _graphApplicationClientService; public ApplicationMsGraphService( GraphApplicationClientService graphApplicationClientService) { _graphApplicationClientService = graphApplicationClientService; } public async Task<Group?> CreateSecurityGroupAsync(string group) { var graphServiceClient = _graphApplicationClientService .GetGraphClientWithClientSecretCredential(); var formatted = RemoveSpecialCharacters(group); var groupName = $"blob-{formatted.Trim()}-{Guid.NewGuid()}".ToLower(); var requestBody = new Group { DisplayName = groupName, Description = $"Security group for all users from {groupName}", MailEnabled = false, MailNickname = formatted, SecurityEnabled = true }; var result = await graphServiceClient.Groups.PostAsync(requestBody); return result; } private string RemoveSpecialCharacters(string str) { var sb = new StringBuilder(); foreach (var c in str) { if (c is >= '0' and <= '9' || c is >= 'A' and <= 'Z' || c is >= 'a' and <= 'z' || c == '.' || c == '_') { sb.Append(c); } } return sb.ToString(); } }

Create an Azure storage blob container

The CreateContainer method creates a new Azure blob container using the BlobServiceClient from the Azure.Storage.Blobs nuget package.

private async Task<BlobContainerClient> CreateContainer(string name) { try { var formatted = RemoveSpecialCharacters(name); string containerName = $"blob-{formatted.Trim()}-{Guid.NewGuid()}" .ToLower(); var storage = _configuration.GetValue<string>("AzureStorage:Storage"); var credential = _clientSecretCredentialProvider .GetClientSecretCredential(); if (storage != null && credential != null) { var blobServiceClient = new BlobServiceClient(new Uri(storage), credential); var metadata = new Dictionary<string, string?> { { "name", name }, }; // Create the root container var blobContainerClient = await blobServiceClient .CreateBlobContainerAsync( containerName, PublicAccessType.None, metadata); if (blobContainerClient.Value.Exists()) { Console.WriteLine( $"Created container: {name} {blobContainerClient.Value.Name}"); } return blobContainerClient.Value; } throw new Exception($"Could not create container: {name}"); } catch (RequestFailedException e) { Console.WriteLine("HTTP error code {0}: {1}", e.Status, e.ErrorCode); Console.WriteLine(e.Message); throw; } }

Create an RBAC to give the security group Blob storage reader permissions

The ApplyReaderGroupToBlobContainer method creates an RBAC for the security group on the blob container itself. The group and the container were created in the previous steps and this takes an unknown length of time. Polly is used to repeat until the group and the container are ready and it creates the assignment.

public async Task ApplyReaderGroupToBlobContainer( BlobContainerClient blobContainer, string groupId) { var maxRetryAttempts = 20; var pauseBetweenFailures = TimeSpan.FromSeconds(3); var retryPolicy = Policy .Handle<Exception>() .WaitAndRetryAsync(maxRetryAttempts, i => pauseBetweenFailures); await retryPolicy.ExecuteAsync(async () => { // RBAC security group Blob data read await _azureMgmtClientService .StorageBlobDataReaderRoleAssignment(groupId, blobContainer.AccountName, blobContainer.Name); // NOTE service principal blob write is configured on root }); }

Azure management REST API is used to create the RBAC. This rest API is implemented using a HttpClient and uses an Enterprise application to define the required permissions. This requires an administration Azure role and with this, you have full control of the Azure tenant.

using System.Net.Http.Headers; using System.Text.Json.Serialization; namespace MultiClientBlobStorage.Providers.Rbac; public class AzureMgmtClientService { private readonly AzureMgmtClientCredentialService _azureMgmtClientCredentialService; private readonly IHttpClientFactory _clientFactory; private readonly IConfiguration _configuration; private readonly ILogger<AzureMgmtClientService> _logger; public AzureMgmtClientService(AzureMgmtClientCredentialService azureMgmtClientCredentialService, IHttpClientFactory clientFactory, IConfiguration configuration, ILogger<AzureMgmtClientService> logger) { _azureMgmtClientCredentialService = azureMgmtClientCredentialService; _clientFactory = clientFactory; _configuration = configuration; _logger = logger; } /// <summary> /// Storage Blob Data Reader: ID: 2a2b9908-6ea1-4ae2-8e65-a410df84e7d1 /// Role assignment required for application in Azure on resource group /// https://learn.microsoft.com/en-us/rest/api/authorization/role-assignments/create-by-id?view=rest-authorization-2022-04-01&tabs=HTTP /// https://learn.microsoft.com/en-us/azure/role-based-access-control/role-assignments-rest /// </summary> public async Task StorageBlobDataReaderRoleAssignment(string groupId, string storageAccountName, string blobContainerName) { // The role ID: Storage Blob Data Reader var roleId = "2a2b9908-6ea1-4ae2-8e65-a410df84e7d1"; var roleNameUnique = $"{Guid.NewGuid()}"; // Must be a guid var subscriptionId = _configuration["AzureMgmt:SubscriptionId"]; // the service principal ID var servicePrincipalId = groupId; // the resource group name var resourceGroupName = _configuration["AzureMgmt:ResourceGroupName"]; var objectId = $"/subscriptions/{subscriptionId}/resourceGroups/{resourceGroupName}/providers/Microsoft.Storage/storageAccounts/{storageAccountName}/blobServices/default/containers/{blobContainerName}"; var url = $"https://management.azure.com{objectId}/providers/Microsoft.Authorization/roleAssignments/{roleNameUnique}?api-version=2022-04-01"; var client = _clientFactory.CreateClient(); var accessToken = await _azureMgmtClientCredentialService.GetAccessToken(); client.DefaultRequestHeaders.Authorization = new AuthenticationHeaderValue("Bearer", accessToken); client.DefaultRequestHeaders.Accept.Add(new MediaTypeWithQualityHeaderValue("application/json")); var roleDefinitionId = $"{objectId}/providers/Microsoft.Authorization/roleDefinitions/{roleId}"; var PayloadRoleAssignment = new PayloadRoleAssignment { Properties = new Properties { RoleDefinitionId = roleDefinitionId, PrincipalId = servicePrincipalId, PrincipalType = "Group" } }; // view containers //var getRe = $"https://management.azure.com/subscriptions/{subscriptionId}/resourceGroups/{resourceGroupName}/providers/Microsoft.Storage/storageAccounts/{storageAccountName}/blobServices/default/containers?api-version=2023-01-01"; //var response = await client.GetAsync(getRe); //var test = await response.Content.ReadAsStringAsync(); var response = await client.PutAsJsonAsync(url, PayloadRoleAssignment); if (response.IsSuccessStatusCode) { var responseContent = await response.Content.ReadAsStringAsync(); _logger.LogInformation("Created RBAC for read group {blobContainerName} {responseContent}", blobContainerName, responseContent); return; } var responseError = await response.Content.ReadAsStringAsync(); _logger.LogCritical("Created RBAC for read group {blobContainerName} {responseError}", blobContainerName, responseError); throw new ApplicationException($"Status code: {response.StatusCode}, Error: {response.ReasonPhrase}, {responseError}"); } private class PayloadRoleAssignment { [JsonPropertyName("properties")] public Properties Properties { get; set; } = new(); } /// <summary> /// "properties": { /// "roleDefinitionId": /// "subscriptions/SUBSCRIPTION_ID/resourcegroups/RESOURCE_GROUP_NAME/providers/Microsoft.Storage/storageAccounts/STORAGE_ACCOUNT_NAME/providers/Microsoft.Authorization/roleDefinitions/ROLE_ID", /// "principalId": "SP_ID" /// } /// </summary> private class Properties { [JsonPropertyName("roleDefinitionId")] public string RoleDefinitionId { get; set; } = string.Empty; [JsonPropertyName("principalId")] public string PrincipalId { get; set; } = string.Empty; [JsonPropertyName("principalType")] public string PrincipalType { get; set; } = "Group"; } }

Putting it together

A Razor page can be used to create the new clients. This method takes an unknown length of time to run and the RBAC also take an unknown length of time to get applied.

[Authorize(Policy = "blob-admin-policy")] public class CreateClientModel : PageModel { private readonly ClientBlobContainerProvider _clientBlobContainerProvider; private readonly ApplicationMsGraphService _applicationMsGraphService; [BindProperty] public string ClientName { get; set; } = string.Empty; public CreateClientModel( ClientBlobContainerProvider clientBlobContainerProvider, ApplicationMsGraphService applicationMsGraphService) { _clientBlobContainerProvider = clientBlobContainerProvider; _applicationMsGraphService = applicationMsGraphService; } public void OnGet() { } public async Task<IActionResult> OnPostAsync() { if (ModelState.IsValid) { var group = await _applicationMsGraphService .CreateSecurityGroupAsync(ClientName); var blobContainer = await _clientBlobContainerProvider .CreateBlobContainerClient(ClientName); if(blobContainer != null && group != null && group.Id != null) { await _clientBlobContainerProvider .ApplyReaderGroupToBlobContainer(blobContainer, group.Id); } } return Page(); } }

Notes

This works well but requires that the application has high privileged access permissions. Most IT departments will not allow this and the creation of blob containers would have to use the IT preferred tools. This type of automation requires 2 different Azure APIs and is not well documented.

Links

Using Blob storage from ASP.NET Core with Entra ID authentication

Implementing secure Microsoft Graph application clients in ASP.NET Core

https://learn.microsoft.com/en-us/azure/storage/blobs/authorize-access-azure-active-directory

https://docs.microsoft.com/en-us/azure/storage/blobs/storage-blobs-introduction

https://github.com/AzureAD/microsoft-identity-web

Sunday, 03. March 2024

Foss & Crafts

62: Blender

Blender, the absolute powerhouse of FOSS 3d (and increasingly 2d) graphics! We give an overview of the software's history, some personal history of our relationships to the software, what it can do, and where we're excited to see it go! Links: Blender Blender history Grease pencil Some historical Blender videos from the NeoGeo and Not a Number days: Did It, Done It, Not a Number comme

Blender, the absolute powerhouse of FOSS 3d (and increasingly 2d) graphics! We give an overview of the software's history, some personal history of our relationships to the software, what it can do, and where we're excited to see it go!

Links:

Blender

Blender history

Grease pencil

Some historical Blender videos from the NeoGeo and Not a Number days: Did It, Done It, Not a Number commercial, Come and See

Elephants Dream, aka Project Orange

Big Buck Bunny

Previous episodes on blender:

Blender for open movie productions and education Sophie Jantak on pet portraits and Blender's Grease Pencil

Blender Conference videos mentioned:

Inklines Across The Spiderverse My Journey Across the Spider-Verse: from Hobbyist to Hollywood Forensic Architecture - spatial analysis for human rights cases

The MediaGoblin campaign video (well, the second one)

14th anniversary animation gift to Morgan

In Unexpected Places

Seams to Sewing Pattern (a Blender plugin for making clothes and stuffed animals!) (could we make Free Soft Wear patterns with it?)

Wing It!

Wing It! Production Logs and Blenderheads

Episodes about lisp, because obviously Blender needs more lisp (who's going to do it):

What is Lisp?

Lisp but Beautiful, Lisp for Everyone


Heres Tom with the Weather

Internationalizing the Fediverse

Yesterday, a request was made to like a post from an account with unicode in the username. As Terence Eden notes in Internationalise The Fediverse Mastodon (the largest ActivityPub service) doesn’t allow Unicode usernames and has resisted efforts to change. So, I tried with Irwin on otisburg.social and the code threw an exception when I tried to interact with the account (URI::Inva

Yesterday, a request was made to like a post from an account with unicode in the username. As Terence Eden notes in Internationalise The Fediverse

Mastodon (the largest ActivityPub service) doesn’t allow Unicode usernames and has resisted efforts to change.

So, I tried with Irwin on otisburg.social and the code threw an exception when I tried to interact with the account

(URI::InvalidURIError) "URI must be ascii only \"https://i18n.viii.fi/@\\u4F60\\u597D\""

This is not an acceptable uri to pass to URI.parse(). In this case, I found some help from a stackoverflow post and fixed the code so I could interact with the account and like the post.

From the comments in Terence’s blog post, I saw Allowed characters in preferredUsername has also been identified as an issue with webfinger.


Wrench in the Gears

Leap Year At Camp Hero – Lessons From Inside The Big Duck

Arrive at the appointed place and time. Swim the waters. Intuit the rules even as you recognize their fluidity. We are agents, of which Conway would approve. Choose and choose again. Those around you choose. From choices trees unfurl. Expansive arbors lick at the horizon. Pixels, neighbors, nodes blink off and on. Life as pattern [...]

Arrive at the appointed place and time.

Swim the waters.

Intuit the rules even as you recognize their fluidity.

We are agents, of which Conway would approve.

Choose and choose again.

Those around you choose.

From choices trees unfurl.

Expansive arbors lick at the horizon.

Pixels, neighbors, nodes blink off and on.

Life as pattern in motion.

Ulam and von Neumann’s cellular automata.

Signals synchronize.

Stigmergic swarms generate exquisite gliders and spaceships to speed across crystalline lattice.

A note in ballpoint on the bottom of a pencil drawer in the Hans Bethe house on Bathtub Row: “A mathematician does something on a piece of paper, and then lo and behold a big explosion can occur. S Ulam – Phila. 1967.”

And that “explosion”

Hydrogen bomb?

Or superposition?

The grail…

Atoms broken?

Or linear time?

Consider consciousness harnessed.

Put to bed Georgia Guidestone fairytales.

“Useless eaters” are not so “useless” when networked towards the Omega point.

Biodigital convergent infrastructure maximizes return on humans tethered to sacred fields.

Liquid crystal fascial computing, soulful frequencies dance through entrained torus fields.

Electroceuticals, biofeedback, “white hat” nano join to decrypt secrets beyond human knowing.

Hypercubes architected for group-mind memory storage.

Manifestation protocols where pictographic tokens masquerade as community currency.

Sophisticated storylines weave consent for Santa Fe’s B game.

Human potential harvested from the cacophony of the wise guy commons.

Is the terrain dangerous or didactic?

The conducting programmer…

Benevolent? Malevolent? Indifferent? Curious? Bored?

Can you count the degrees of freedom?

Find your way to the edge?

Flip the board?

Become the governor?

An empire of one?

One man is an island…

Chaotic cycles churn through gap junctions.

Emotional distance offers fleeting grace, a respite of unexpected order.

Thoughtlessly step into the void.

Life’s obvious turning points are outnumbered by countless split paths traversed on autopilot.

Doors are opened, opportunities are taken off the table, outcomes obscured.

Five senses cannot perceive beyond the bend,

But probiotics on standby may orchestrate the holobiont stochastic symphony.

Even as a gut-led leap year duck visitation arose en-route to Montauk.

There, textured yarns were spun of nuclear estuary rituals entangled with digital totem poles, tree spirits, astral projection, and advice on finding your way home.

Where lighthouse and radar tower flank hoodoo bluffs.

Cobbled, glacial beaches kiss Atlantic waves, a naval domain of photonics and frequency.

Microwave ether claimed sodden oak, velvet moss, lichen and quartz.

Inner knowing dispells history’s iron curtain.

Are you the hero? 

Is a hero what’s needed? 

 

 

Interactive Map: https://embed.kumu.io/595f0be99ffa93a388addc2e1b262158#untitled-map?s=bm9kZS1Id1NxMmg4Zw%3D%3D

 

 

Tuesday, 27. February 2024

Just a Theory

Extension Ecosystem Summit 2024

Some pals and I organized a summit at PGConf.dev on May 28 to work together as a community toward comprehensive indexing, discovery, and binary distribution, as well as a speaker series “mini summit” leading up to the event.

I’m pleased to announce that some pals and I have organized and will host the (first annual?) Extension Ecosystem Summit at PGConf.dev in Vancouver (and more, see below) on May 28:

Enabling comprehensive indexing, discovery, and binary distribution.

Participants will collaborate to examine the ongoing work on PostgreSQL extension distribution, examine its challenges, identify questions, propose solutions, and agree on directions for execution.

Going to PGConf? Select it as an “Additional Option” when you register, or update your registration if you’ve already registered. Hope to see you there!

Extension Ecosystem Mini-Summit

But if you can’t make it, that’s okay, because in the lead up to the Summit, to we’re hosting a series of six virtual gatherings, the Postgres Extension Ecosystem Mini-Summit.

Join us for an hour or so every other Wednesday starting March 6 to hear contributors to a variety of community and commercial extension initiatives outline the problems they want to solve, their attempts to so, challenges discovered along the way, and dreams for an ideal extension ecosystem in the future. Tentative speaker lineup:

March 6: David Wheeler, PGXN: “State of the Extension Ecosystem” March 20: Ian Stanton, Tembo: “Building Trunk: A Postgres Extension Registry and CLI” April 3: Devrim Gündüz: “yum.postgresql.org and the challenges RPMifying extensions” April 17: Jonathan Katz: “TLE Vision and Specifics” May 1: Yurii Rashkovskii, Omnigres: “Universally buildable extensions: dev to prod” May 15: (Placeholder) David Wheeler, PGXN: “Metadata for All: Enabling discovery, packaging, and community”

Hit the event page for details. Many thanks to my co-organizers Jeremy Schneider, David Christensen, Keith Fiske, and Devrim Gündüz, as well as the PGConf.dev organizers for making this all happen!

Update: 2024-03-06: Updated the talk schedule.

More about… Postgres PGXN Extensions PGConf Vancouver Summit

Tuesday, 27. February 2024

Just a Theory

RFC: Extension Metadata Typology

Thinking through the PostgreSQL extension metadata use cases and recognizing the types of information they need.

Lately I’ve been thinking a lot about metadata for Postgres extensions. Traditional use cases include control file metadata, which lives in .control files used by CREATE EXTENSION and friends, and PGXN metadata, which lives in META.json files used by PGXN to index and publish extensions. But these two narrow use cases for SQL behavior and source code distribution don’t provide the information necessary to enable other use cases, including building, installing, configuration, and more.

So I have also been exploring other metadata formats, including:

Go go.mod Cargo Manifest File npm package.json trunk.toml pgxman buildkit

These standards from neighboring communities reveal a great deal of overlap, as one might expect (everything has a name, a version, an author, license, and so on), but also types of metadata that had not occurred to me. As I took notes and gathered suggestions from colleagues and coworkers, I began to recognize natural groupings of metadata. This lead to the realization that it might be easier — and more productive — to think about these groupings rather than individual fields.

I therefore propose a typology for Postgres extension metadata.

Extension Metadata Typology Essentials

Essential information about the extension itself, including its name (or unique package name), version, list of authors, license, etc. Pretty much every metadata format encompasses this data. Ecosystem applications use it for indexing, installation locations, naming conventions, and display information.

Artifacts

A list of links and checksums for downloading the extension in one or more formats, including source code, binaries, system packages, and more. Apps use this information to determine the best option for installing an extension on a particular system.

Resources

External information about the extension, mostly links, including source code repository, bug reporting, documentation, badges, funding, etc. Apps use this data for links, of course, but also full text indexing, documentation rendering, and displaying useful information about the extension.

Contents

A description of what’s included in the extension package. Often an “extension” consists of multiple extensions, such as PostGIS, which includes postgis, postgis_tiger_geocoder, address_standardizer, and more. Furthermore, some extensions are not CREATE EXTENSION-type extension at all, such as background workers, command-line apps, libraries, etc. Each should be listed along with documentation links where they differ from the package overall (or are simply more specific).

Prerequisites

A list of external dependencies required to configure, build, test, install, and run the extension. These include not only other extensions, but also external libraries and OS-specific lists of binary package dependencies. And let’s not forget the versions of Postgres required, as well as any OS and version dependencies (e.g, does it work on Windows? FreeBSD? What versions?) and architectures (arm64, amd64, etc.)

How to Build It

Metadata that apps use to determine how to build the extension. Does it use the PostgreSQL PGXS build pipeline? Or perhaps it needs the cargo-based pgrx toolchain. Maybe a traditional ./configure && make pattern? Perl, Ruby, Python, Go, Rust, or NPM tooling? Whatever the pattern, this metadata needs to be sufficient for an ecosystem app to programmatically determine how to build an extension.

How to Install It

Usually an extension of the build metadata, the install metadata describes how to install the extension. That could be PGXS or pgrx again, but could also use other patterns — or multiple patterns! For example, perhaps an extension can be built and installed with PGXS, but it might also be TLE-safe, and therefore provide details for handing the SQL files off to a TLE installer.

This typology might include additional data, such as documentation files to install (man pages anyone?), or directories of dependent files or libraries, and the like — whatever needs to be installed for the extension.

How to Run It

Not all Postgres extensions are CREATE EXTENSION extensions. Some provide background workers to perform various tasks; others simply provide Utility applications like pg_top and pg_repack. In fact pg_repack provides both a command-line application and a CREATE EXTENSION extension in one package!

This metadata also provides configuration information, both control file parameters like trusted, superuser, and schema, but also load configuration information, like whether an extension needs its libraries included in shared_preload_libraries to enable LOAD or requires a cluster restart. (Arguably this information should be in the “install” typology rather than “run”.)

Classification

Classification metadata lets the extension developer associate additional information to improve discovery, such as key words. It might also allow selections from a curated list of extension classifications, such as the category slugs supported for the cargo categories field. Ecosystem apps use this data to organize extensions under key words or categories, making it easier for users to find extensions often used together or for various workloads or tasks.

Metrics and Reports

This final typology differs from the others in that its metadata derives from third party sources rather than the extension developer. It includes data such as number of downloads, build and test status on various Postgres/OS/version combinations, binary packaging distributions, test coverage, security scan results, vulnerability detection, quality metrics and user ratings, and more.

In the broader ecosystem, it would be the responsibility of the root registry to ensure such data in the canonical data for each extension comes only from trusted sources, although applications downstream of the root registry might extend metrics and reports metadata with their own information.

What More?

Reading through various metadata standards, I suspect this typology is fairly comprehensive, but I’m usually mistaken about such things. What other types of metadata do you find essential for the use cases you’re familiar with? Do they fit one of the types here, or do they require some other typology I’ve failed to imagine? Hit the #extensions channel on the Postgres Slack to contribute to the discussion, or give me a holler on Mastodon.

Meanwhile, I’ll be refining this typology and assigning all the metadata fields to them in the coming weeks, with an eye to proposing a community-wide metadata standard. I hope it will benefit us all; your input will ensure it does.

More about… Postgres Extensions Metadata Classification Categories RFC

Monday, 26. February 2024

Mike Jones: self-issued

COSE “typ” (type) Header Parameter Specification Addressing IETF Last Call Feedback

Orie Steele and I have updated the COSE “typ” (type) Header Parameter Specification to address feedback received during IETF Last Call. No normative changes were made. Thanks to those that reviewed the specification! The specification is available at: https://www.ietf.org/archive/id/draft-ietf-cose-typ-header-parameter-03.html Besides the spec being useful on its own, it’s worth noting that the CBO

Orie Steele and I have updated the COSE “typ” (type) Header Parameter Specification to address feedback received during IETF Last Call. No normative changes were made.

Thanks to those that reviewed the specification!

The specification is available at:

https://www.ietf.org/archive/id/draft-ietf-cose-typ-header-parameter-03.html

Besides the spec being useful on its own, it’s worth noting that the CBOR Web Token (CWT) Claims in COSE Headers specification references this spec, and so won’t exit the RFC Editor queue as an RFC until this one also does.


Damien Bod

Delegated read and application write access to blob storage using ASP.NET Core with Entra ID authentication

This article shows how an ASP.NET Core application can control the write access to an Azure blob storage container using an application app registration. Microsoft Entra ID is used to control the user access and to implement the authentication of the web application. Code: https://github.com/damienbod/AspNetCoreEntraIdBlobStorage Blogs in this series The solution provides a secure upload […]

This article shows how an ASP.NET Core application can control the write access to an Azure blob storage container using an application app registration. Microsoft Entra ID is used to control the user access and to implement the authentication of the web application.

Code: https://github.com/damienbod/AspNetCoreEntraIdBlobStorage

Blogs in this series Using Blob storage from ASP.NET Core with Entra ID authentication Delegated read and application write access to blob storage using ASP.NET Core with Entra ID authentication Multi client blob storage access using ASP.NET Core with Entra ID authentication and RBAC

The solution provides a secure upload and secure download using Microsoft Entra ID. Users are authenticated using Microsoft Entra ID. The blob storage containers use Microsoft Entra ID security groups to control the read access. The upload access uses the an Enterprise application for the contributor role. This is used from the ASP.NET Core application for file upload.

In Azure, only the application can upload files and the users of the groups can only read the files.

The RBAC are setup as shown:

Blob storage application write access

The application can upload files to Azure blob storage. Client secrets are used to acquire the access token using the client credentials OAuth flow. This can be further improved by using a certificate and using the client assertions or even a managed identity from the host service. For development, I normally use a secret for simplicity.

using Azure.Identity; namespace DelegatedReadAppWriteBlobStorage.FilesProvider.AzureStorageAccess; public class ClientSecretCredentialProvider { private readonly IConfiguration _configuration; public ClientSecretCredentialProvider(IConfiguration configuration) { _configuration = configuration; } public ClientSecretCredential GetClientSecretCredential() { // Values from app registration var tenantId = _configuration.GetValue<string>("AzureAd:TenantId"); var clientId = _configuration.GetValue<string>("ApplicationClient:ClientId"); var clientSecret = _configuration.GetValue<string>("ApplicationClient:ClientSecret"); var options = new ClientSecretCredentialOptions { AuthorityHost = AzureAuthorityHosts.AzurePublicCloud }; // https://docs.microsoft.com/dotnet/api/azure.identity.clientsecretcredential var clientSecretCredential = new ClientSecretCredential( tenantId, clientId, clientSecret, options); return clientSecretCredential; } }

The BlobApplicationUploadProvider uses the token and makes it possible to upload files to the Azure Blob storage. The provider uses the IFormFile as a parameter for the file data. Meta data is persisted in the local SQL database.

using Azure.Storage.Blobs; using Azure.Storage.Blobs.Models; namespace DelegatedReadAppWriteBlobStorage.FilesProvider.AzureStorageAccess; public class BlobApplicationUploadProvider { private readonly IConfiguration _configuration; private readonly ClientSecretCredentialProvider _clientSecretCredentialProvider; public BlobApplicationUploadProvider(ClientSecretCredentialProvider clientSecretCredentialProvider, IConfiguration configuration) { _configuration = configuration; _clientSecretCredentialProvider = clientSecretCredentialProvider; } public async Task<string> AddNewFile(BlobFileUploadModel blobFileUpload, IFormFile file) { try { return await PersistFileToAzureStorage(blobFileUpload, file); } catch (Exception e) { throw new ApplicationException($"Exception {e}"); } } private async Task<string> PersistFileToAzureStorage( BlobFileUploadModel blobFileUpload, IFormFile formFile, CancellationToken cancellationToken = default) { var storage = _configuration.GetValue<string>("AzureStorage:StorageAndContainerName"); var fileFullName = $"{storage}/{blobFileUpload.Name}"; var blobUri = new Uri(fileFullName); var blobUploadOptions = new BlobUploadOptions { Metadata = new Dictionary<string, string?> { { "uploadedBy", blobFileUpload.UploadedBy }, { "description", blobFileUpload.Description } } }; var blobClient = new BlobClient(blobUri, _clientSecretCredentialProvider.GetClientSecretCredential()); var inputStream = formFile.OpenReadStream(); await blobClient.UploadAsync(inputStream, blobUploadOptions, cancellationToken); return $"{blobFileUpload.Name} successfully saved to Azure Blob Storage Container"; } }

Blob storage delegated read access

The blob storage reader access is given to all users in the security group. This security group uses the app-role from the Azure app registration and the group can access the blob container, not the blob account. No SAS are used .

using Azure.Core; using Microsoft.Identity.Client; using Microsoft.Identity.Web; namespace DelegatedReadAppWriteBlobStorage.FilesProvider.AzureStorageAccess; public class DelegatedTokenAcquisitionTokenCredential : TokenCredential { private readonly ITokenAcquisition _tokenAcquisition; private readonly IConfiguration _configuration; public DelegatedTokenAcquisitionTokenCredential(ITokenAcquisition tokenAcquisition, IConfiguration configuration) { _tokenAcquisition = tokenAcquisition; _configuration = configuration; } public override AccessToken GetToken(TokenRequestContext requestContext, CancellationToken cancellationToken) { throw new NotImplementedException(); } public override async ValueTask<AccessToken> GetTokenAsync(TokenRequestContext requestContext, CancellationToken cancellationToken) { string[]? scopes = _configuration["AzureStorage:ScopeForAccessToken"]?.Split(' '); if (scopes == null) { throw new Exception("AzureStorage:ScopeForAccessToken configuration missing"); } AuthenticationResult result = await _tokenAcquisition .GetAuthenticationResultForUserAsync(scopes); return new AccessToken(result.AccessToken, result.ExpiresOn); } } Notes

Using an application for write access to the blob storage has both advantages and disadvantages in context of security. Removing the write access to all users is positive, but when using applications or if the application has weak security, this approach can be bad. Using a managed identity for the deployed service would improve the security and would remove the need for secrets or certificates to authenticate the application. This prepares the access for the next solution which would be to implement a multi-tenant or multi-client application using the Azure blob.

Links

Using Blob storage from ASP.NET Core with Entra ID authentication

Implementing secure Microsoft Graph application clients in ASP.NET Core

https://learn.microsoft.com/en-us/azure/storage/blobs/authorize-access-azure-active-directory

https://docs.microsoft.com/en-us/azure/storage/blobs/storage-blobs-introduction

https://github.com/AzureAD/microsoft-identity-web

Saturday, 24. February 2024

Heres Tom with the Weather

Morris Lukowich at Saint Arnold Brewing

Today I was fortunate to have my picture taken with Morris Lukowich during Hockey Day in Houston at Saint Arnold Brewing. The best hockey games I have ever watched in-person were the 1977-78 Houston Aeros. The WHA All-Star game was in Quebec City on January 17, 1978 and for the picture I believe it is Lukowich’s All-Star jersey from that game that I have borrowed.

Today I was fortunate to have my picture taken with Morris Lukowich during Hockey Day in Houston at Saint Arnold Brewing. The best hockey games I have ever watched in-person were the 1977-78 Houston Aeros. The WHA All-Star game was in Quebec City on January 17, 1978 and for the picture I believe it is Lukowich’s All-Star jersey from that game that I have borrowed.

Friday, 23. February 2024

Mike Jones: self-issued

Continued refinement: OpenID Federation draft 33 published

OpenID Federation draft 33 has been published at https://openid.net/specs/openid-federation-1_0-33.html and https://openid.net/specs/openid-federation-1_0.html. The working group continues refining the specification to make it more consistent and easier to read and implement. We published draft 33 now to get these improvements out to implementers. Per the history entries at https://openid.net/specs

OpenID Federation draft 33 has been published at https://openid.net/specs/openid-federation-1_0-33.html and https://openid.net/specs/openid-federation-1_0.html. The working group continues refining the specification to make it more consistent and easier to read and implement.

We published draft 33 now to get these improvements out to implementers. Per the history entries at https://openid.net/specs/openid-federation-1_0-33.html#name-document-history, a summary of changes made in -32 and -33 is:

-33:

Addressed #2111: The metadata_policy_crit claim MAY only appear in Subordinate Statements and its values apply to all metadata_policies found in the Trust Chain. Fixed #2096: Authorization Signed Request Object may contain trust_chain in its payload and should not in its JWS header parameters. Strengthen language requiring client verification with automatic registration. Fixed #2076: Promoted Trust Marks to be a top-level section. Added General-Purpose JWT Claims section. Moved Federation Endpoints section before Obtaining Federation Entity Configuration Information section. Fixed #2110: Explanation text when multiple entity_type parameters are provided in the Subordinate Listing endpoint. Fixed #2112, #2113, and #2114: Defined that client authentication is not used by default and that the default client authentication method, when used, is private_key_jwt. Specified that requests using client authentication use HTTP POST. Fixed #2104: Allow trust marks in Subordinate Statements for implementation profiles that might want this. Fixed #2103: Addressed ambiguities in the definition of constraints.

-32:

Tightened OpenID Connect Client Registration section. Tightened appendix examples. Fixed #2075: Trust Mark endpoint for the provisioning of the Trust Marks. Fixed #2085: Trust Marked Entities Listing, added sub URL query parameter. Made fetch issuer unambiguous by making the iss parameter REQUIRED. Introduced the term “Subordinate Statement” and applied it throughout the specification. Also consistently use the term “registration Entity Statement” for Explicit Client Registration results. Clarified where Entity Statement claims can and cannot occur. Renamed policy_language_crit to metadata_policy_crit. Fixed #2093: Numbered the list defining the order policy operators are applied in.

Special thanks to Stefan Santesson for his thorough review of the specification in the context of the Swedish Federation deployment!


Phil Windleys Technometria

Identity Metasystems and Lessons from Building the Sovrin Foundation

I recently spoke with Riley Hughes of Trinsic on his Future of Identity podcast about the birth of Sovrin Foundation, it's inevitable growing pains, self-sovereign identity, identity metasystems, and adoption. Give it a listen. I'm grateful to Riley for having me on as a guest.

I recently spoke with Riley Hughes of Trinsic on his Future of Identity podcast about the birth of Sovrin Foundation, it's inevitable growing pains, self-sovereign identity, identity metasystems, and adoption. Give it a listen.

I'm grateful to Riley for having me on as a guest.

Some of the ideas in this podcast are explored more deeply in my latest book, Learning Digital Identity from O’Reilly Media. You can support my writing by buying a copy!

If you enjoyed this, please consider sharing it with a friend or twenty.


Heres Tom with the Weather

Doctors Without Borders urgent update on #Gaza to UN Security Council

MSF to UN Security Council: The people of Gaza need an immediate and sustained ceasefire now

Thursday, 22. February 2024

Justin Richer

Applying RAR in OAuth 2 (and GNAP)

The Rich Authorization Request extension to OAuth 2, or RAR, is a way to talk about access in the OAuth space beyond what scopes allow, and it is defined in RFC9396. One of the key motivations behind RAR was admitting, as a community, that a lot of times you need more than a set of scope values to properly describe access to an API. A Larger Scope Scopes work pretty well in OAuth, but

The Rich Authorization Request extension to OAuth 2, or RAR, is a way to talk about access in the OAuth space beyond what scopes allow, and it is defined in RFC9396. One of the key motivations behind RAR was admitting, as a community, that a lot of times you need more than a set of scope values to properly describe access to an API.

A Larger Scope

Scopes work pretty well in OAuth, but they’re ultimately just a set of strings. It’s worlds better than the all-or-nothing access that OAuth 1 or API keys give you, and if you have an API, they allow you to easily separate read and write access. But what if you want to be more specific? What if you want to, say, let someone access a specific account for certain features for a certain amount of time? Or read in one place but write to another? And what if you want to compose that set of features differently, such as any account for an unlimited time but only if the user’s online? The ways of describing API access are as varied as the APIs being accessed.

When faced with this problem, the first thing that many people realize is that because scopes are just strings, their own API can impose semantics and syntax on those strings. So people add parameters, or define composition rules, or even define a whole query language embedded into the scope string. But those approaches require a lot of complex processing and lack interoperable structure that would let you easily combine protection of multiple kinds of APIs.

What RAR gives you, instead, is an array of JSON objects to describe your API access. So instead of trying to cram everything into a single string, you can now put together an object that defines exactly what you want.

[
{
"type": "payment_initiation",
"actions": [
"initiate",
"status",
"cancel"
],
"locations": [
"https://example.com/payments"
],
"instructedAmount": {
"currency": "EUR",
"amount": "123.50"
},
"creditorName": "Merchant A",
"creditorAccount": {
"iban": "DE02100100109307118603"
},
"remittanceInformationUnstructured": "Ref Number Merchant"
}
]

This object is specific to the API in question and carries with it all the detail that is needed for successful processing. Each kind of API can define its own type value, which in turn defines what’s allowed to go into the rest of the object. And if you need more than one view of things, like read access to A but write access to B, then you can pass in multiple objects in the same structure.

One important question arises out of this: who needs to know this level of detail?

Who Cares About RAR

In this regard, RAR really is built on top of the concept of a scope. In an OAuth delegation, there are four parties. The client, resource owner, authorization server (AS), and resource server (RS).

The four parties in OAuth delegation, illustration from OAuth 2 In Action

These parties have particular relationships, and each of them might care about a RAR object or a scope in a slightly different way. However, the more important question is about which relationship is in play.

Client -> AS: Requesting Access

.When requesting an access token, the client needs to be able to describe to the AS what it wants. RAR allows the client to get VERY specific, if the client knows what details it wants ahead of time. Maybe the client has prompted the resource owner for an account identifier, or has learned through some other protocol where the target system is located, or it’s just been configured to know that it needs to ask for specific objects in order to do specific things. In all of these cases, the client can send RAR objects to the AS just like it would a scope, in the hopes of getting an access token that can do what it asks for.

AS -> Client: Granting Access

When the access token is granted, the AS can tell the client which RAR objects have been applied to the token. While this information is no substitute for an API discovery protocol, this approach can let the client differentiate what an access token is good for in different dimensions. For example, a client can ask for a token for an available signing service, and then be granted a token for use at a specific signing service, indicated through the locations field in the RAR object.

AS -> Resource owner: Gathering Authorization

During the delegation process, the AS often needs to prompt the resource owner to see if they’re OK with what’s being delegated. While this starts as the Client->AS request, RAR gives the AS an opportunity to fine-tune the access by asking the resource owner to be specific, or even filling in values that get put into the resulting RAR object. Maybe the client is asking for account access but the resource owner stipulates that it’s only good for the next five minutes. This does come at a usability cost, since it’s much easier to display a list of scope strings with checkboxes. But experience has shown that this list is not a great security measure anyway, since most users won’t change the checkboxes, and often don’t understand the differentiated access being granted.

AS -> RS: Describing Access

The access token itself represents a certain set of rights that have been granted. These can be described in the metadata of the token, available through either a structured token field or an introspection response. In this way, the RS can learn what an access token is good for, and apply its policies appropriately. Does the token grant access for the HTTP GET command on the resource at /photo/123-fda1d? Is this token even good at this specific RS, or is it meant for somewhere else? The RAR object can be used to describe all of this.

Not Everything Has To Match

Finally, it’s important to note that the all of these different branches need not match each other in a single transaction. In one of the applications where I’ve personally deployed RAR, the client never sees the RAR objects. The client knows to ask for a specific scope, and the AS knows that when it sees that scope, the resulting token needs to apply to a whole set of things represented by the current user’s access within the system. The downstream APIs know nothing about users or accounts, but they do know the resources they protect.

As a consequence, the AS translates the client’s incoming scope request to a set of RAR objects that the APIs understand. The APIs never see or care about the scope, and the client never sees or cares about the RAR. In this way, internal API details stay internal and do not leak unnecessarily into the wider system.

However, a different client in this same ecosystem does have insight into the details of the API structure, and therefore its requests do specify RAR objects that target the APIs. These objects are processed in exactly the same way by the API servers, which gives us a powerful parallelism and profound code reuse in production.

GNAP Native

In GNAP, one of our main goals was to see what an OAuth-style system would look like without the constraints and history of OAuth, and one such constraint includes scopes. Consequently, GNAP’s native access rights is an array of objects that look suspiciously like RAR objects. This design is, of course, intentional, and in many ways RAR is the backport of GNAP’s access rights system to work on top of OAuth 2. While GNAP doesn’t have scopes in the same way, GNAP’s reference-based approach to its API design does allow for the use of a simple string to stand in for the objects in question, allowing a request to have both shortcut and fully specified items in the same request.

"access": [
{
"type": "photo-api",
"actions": [
"read",
"write"
],
"locations": [
"https://server.example.net/",
"https://resource.local/other"
],
"datatypes": [
"metadata",
"images"
],
"geolocation": [
{ lat: -32.364, lng: 153.207 },
{ lat: -35.364, lng: 158.207 }
]
},
{
"type": "financial-transaction",
"actions": [
"withdraw"
],
"identifier": "account-14-32-32-3",
"currency": "USD"
},
"dolphin-metadata",
"some other thing"
] How Can I Use RAR?

RAR support is starting to show up across different vendors, though it’s not universal yet. One of the companies I work for, Authlete, supports RAR natively. Other products can often have RAR grafted on top, since it takes the form of an extra parameter to be processed by an extension or module.

The real value is that we are starting to see API access defined in terms of RAR objects, replacing the awkward and error-prone string composition practices of the past. RAR may seem complex, but when you look at how APIs are defined and scopes are used, the power of that complexity really starts to show its value.


Just a Theory

The History and Future of Extension Versioning

What versioning standard should be used for Postgres extension distribution? Some context from PostgreSQL and PGXN, a survey of the version standard landscape today, and a recommendation.

Every software distribution system deals with versioning. Early in the design of PGXN, I decided to require semantic versions (SemVer), a clearly-defined and widely-adopted version standard, even in its pre-1.0 specification. I implemented the semver data type that would properly sort semantic versions, later ported to C by Sam Vilain and eventually updated to semver 2.0.0.

As I’ve been thinking through the jobs and tools for the Postgres extension ecosystem, I wanted to revisit this decision, the context in which it was made, and survey the field for other options. Maybe a “PGXN v2” should do something different?

But first that context, starting with Postgres itself.

PostgreSQL Extension Version Standard

From the introduction extensions in PostgreSQL 9.1, the project side-stepped the need for version standardization and enforcement by requiring extension authors to adopt a file naming convention, instead. For example, an extension named “pair” must have a file with its name, two dashes, then the version as listed in its control file, like so:

pair--1.1.sql

As long as the file name is correct and the version part byte-compatible with the control file entry, CREATE EXTENSION will find it. To upgrade an extension the author must provide a second file with the extension name, the old version, and the new version, all delimited by double dashes. For example, to upgrade our “pair” extension to version 1.2, the author supply all the SQL commands necessary to upgrade it in this file:

pair--1.1--1.2.sql

This pattern avoids the whole question of version standards, ordering for upgrades or downgrades, and all the rest: extension authors have full responsibility to name their files correctly.

PGXN Versions

SemVer simplified a number of issues for PGXN in ways that the PostgreSQL extension versioning did not (without having to re-implement the core’s file naming code). PGXN wants all metadata for an extension in its META.json file, and not to derive it from other sources that could change over time.

Following the CPAN model, PGXN also required that extension releases never decrease the version.1 The well-defined sortability of semantic versions made this validation trivial. PGXN later relaxed enforcement to allow updates to previously-released versions. SemVer’s clearly specified sorting made this change possible, as the major.minor.patch precedence intuitively compare from left to right.

In other words, if one had previously released version 1.2.2, then released 1.3.0, a follow-up 1.2.3 is allowed, increasing the 1.2.x branch version, but not, say, 1.2.1, which decreases the 1.2.x branch version.

Overall, semantic versions have been great for clarity of versioning of PGXN extensions. The one bit of conflict comes from extensions that use some other other version standard in the control file, usually a two-part x.y version not allowed by SemVer, which requires x.y.z (or, more specifically, major.minor.patch).

But such versions are usually compatible with SemVer, and because PGXN cares only about the contents of the META.json, they’re free to use their own versions in the control file, just as long as the META.json file uses SemVers.

For example, the recent nominatim_fdw v1.0.0 release, which of course lists "version": "1.0.0" in its META.json file, sticks to its preferred default_version = '1.0' in its control file. The extension author simply appends .0 to create a valid SemVer from their preferred version, and as long as they never use any other patch number, it remains compatible.

Versioning Alternatives

Surveying the versioning landscape in 2024 yields a number of approaches. Might we prefer an alternative for future extensions distribution? Let’s look at the possibilities.

Ad Hoc Versions

As described above, the Postgres file naming convention allows ad hoc versions. As far as I can tell, so does the R Project’s CRAN. This approach seems fine for systems that don’t need to follow version changes themselves, but much trickier for systems that do. If I want to install the latest version of an extension, how does the installer know what that latest version is?

The answer is that the extension author must always release them in the proper order. But if someone releases 1.3.1 of an extension, and then 1.2.1, well then 1.2.1 is the latest, isn’t it? It could get confusing pretty quickly.

Seems better to require some system, so that download and install clients can get the latest version — or the latest maintenance version of an earlier release if they need it.

User Choice

Quite a few registries allow users to choose their own versioning standards, but generally with some very specific recommendations to prevent confusion for users.

Python Packaging is fairly liberal in the versions it allows, but strongly recommends semantic versioning or calendar versioning (more on that below). CPAN (Perl) is also fairly liberal, due to its long history of module distribution, but currently requires “Decimal versions”, which are evaluated as floating-point numbers, or dotted integer versions, which require three dot-separated positive integers and must begin with the letter v. RubyGems does not enforce a versioning policy, but warns that “using an ‘irrational’ policy will only be a disservice to those in the community who use your gems.” The project therefore urges developers to follow SemVer.

These three venerable projects date from an earlier period of registration and distribution, and have made concessions to times when no policies existed. Their solutions either try to cover as many legacy examples as possible while recommending better patterns going forward (Python, Perl), or simply make recommendations and punt responsibility to developers.

SemVer

More recently-designed registries avoid this problem by requiring some level of versioning standard from their inception. Nearly all use SemVer, including:

Go Modules, where “Each version starts with the letter v, followed by a semantic version.” Cargo (Rust), which “uses SemVer for specifying version numbers. This establishes a common convention for what is compatible between different versions of a package.” npm, where the “version must be parseable by node-semver, which is bundled with npm as a dependency.” CalVer

CalVer eschews context-free incrementing integers in favor of semantically-meaningful versions, at least for some subset of a version string. In other words: make the version date-based. CalVer-versioned projects usually include the year and sometimes the month. Some examples:

Ubuntu uses YY.0M.MICRO, e.g., 23.04, released in April 2023, and 23.10.1, released in October 2023 Twisted uses YY.MM.MICRO, e.g., 22.4.0, released in April 2022

Ultimately, adoption of a CalVer format is a more choice about embedding calendar-based meaning into a version more than standardizing a specific format. One can of course use CalVer semantics in a semantic version, as in the Twisted example, which is fully-SemVer compliant.

In other words, adoption of CalVer need not necessitate rejection of SemVer.

Package Managers

What about package managers, like RPM and Apt? Some canonical examples:

RPM packages use the format:

<name>-<version>-<release>.<architecture>

Here <version> is the upstream version, but RPM practices a reasonable (if baroque) version comparison of all its parts. But it does not impose a standard on upstream packages, since they of course vary tremendously between communities and projects.

Apt packages use a similar format:

[epoch:]upstream_version[-debian_revision]

Again, upstream_version is the version of the upstream package, and not enforced by Apt.

APK (Alpine Linux) packages use the format

{digit}{.digit}...{letter}{_suf{#}}...{-r#}

I believe that {digit}{.digit}...{letter} is the upstream package version.

This pattern makes perfect sense for registries that repackage software from dozens of upstream sources that may or may not have their own policies. But a system that defines the standard for a specific ecosystem, like Rust or PostgreSQL, need not maintain that flexibility.

Recommendation

Given this survey, I’m inclined to recommend that the PostgreSQL community follow the PGXN (and Go, and Rust, and npm) precedent and continue to rely on and require semantic versions for extension distribution. It’s not perfect, given the contrast with the core’s lax version requirements. CalVer partisans can still use it, though with fewer formatting options (SemVer forbids leading zeros, as in the Ubuntu 23.04 example).

But with its continuing adoption, and especially its requirement by more recent, widely-used registries, and capacity to support date-based semantics for those who desire it, I think it continues to make the most sense.

Wrong!

I’m probably wrong. I’m often mistaken in one way or another, on the details or the conclusion. Please tell me how I’ve messed up! Find me on the #extensions channel on the Postgres Slack or ping me on Mastodon.

Why? Because every module on CPAN has one and only one entry in the index file. Ricardo Signes explains↩︎

More about… Postgres Extensions PGXN Versioning SemVer CalVer

Wednesday, 21. February 2024

David Kelts on ID

“Ripping” IDs Should Go the Way of CDs…

The Solvable Problems of Mobile KYC Remember endless hours ripping your CD collection to digital…why are we spending endless hours scanning analog ID documents over and over again? Alternatives exist. Let’s think differently about how Mobile KYC fails customers A recent thread on a work-oriented social platform caught my attention. Some people I know to be the people they are could not veri
The Solvable Problems of Mobile KYC

Remember endless hours ripping your CD collection to digital…why are we spending endless hours scanning analog ID documents over and over again? Alternatives exist. Let’s think differently about how Mobile KYC fails customers

A recent thread on a work-oriented social platform caught my attention. Some people I know to be the people they are could not verify their profiles using a mobile KYC solution. This article isn’t to bash those solutions, or the myriad improvements made, but to urge our industry to understand the complexity we’ve foisted on our users and incite action to do different.

Analog to Digital Conversion, anyone? “Ripping IDs” to confirm identity should go the way of Ripping CDs… How do I know the problems that exist?

In December 2014, before a presentation to the Identity Ecosystem Steering Group about how to “Bring the trust of the Driver’s license online”, MorphoTrust USA filed a provisional application that became US Patent 10,678,939. The front-back-selfie wave was born, admittedly without the system of record connection described in that patent text. I spent several years leading a smart team improving an identity verification product line.

I’ve watched thousands of people try to run these selfie-to-DL matching steps in various forms and helped debug their issues with smiles, vision acuity, phone models, and lighting. I’ve spent weekends helping hundreds of great Utahns use mobile registration live in a credit union lobby.

Visually impaired people cannot get over these hurdles. It takes too many hands for these solutions to be inclusive. Back-facing phone camera quality varies greatly with how much you can spend. These solutions are not inclusive by any stretch of the imagination, and the fallback is to go to a physical location.

The Places Within the Selfie-ID Verification Process that Fail Manually taking a quality, flat, well-lit, high-res picture of a card is hard Taking an ICAO-quality, well-lit, machine-matchable portrait is difficult. Everybody wants to smile (or make duck lips …idk…) in a selfie Webcams are very low resolution compared to phones and vary widely PDF417 Barcodes from the back of DL/ID cards are easy to generate and nearly every Fake ID has the same information on the back and front Connections to databases that would verify the authenticity and validity of a DL/ID are expensive or restricted to law enforcement (makes sense) Most hard-to-forge security features of an ID card require multi-spectral light, are tactile, or require magnification. Phones capture visible light Cropping the card portrait results in a small, low-res photo for machine-matching that has security lines, holograms, and indicia through it Cropping a portrait from a poorly captured, low-res ID card will not give sufficient resolution to measure the facial features for matching You need cloud/phone processing power to determine that a human is live and real for the probe image — called Presentation Attack Detection I have to ID myself repeatedly with each new service that wants (notice I didn’t say “needs”) ID and each KYC system differs, creating doubt

Can I stop here? We haven’t even gotten to the accuracy of biometric matching algorithms (new entries, if used, have solved any racial/gender bias, so lighting is the major hurdle to accurate matching). People think biometrics are 100% when a human visual verificdation is only ~97% (per a 2017 Google study that I cannot find).

Oh, did I mention that people end up with images of their IDs in photo rolls and synchronized to every app they grant photo library permission?

Humans are humans and will do what humans do using technology.

You simply cannot account for the variations people and machines will cause to happen (credit Jurassic Park)

If there are this many variables to a process that seems conceptually simple, one or more of those variables will go wrong.

We are turning away good potential customers

There are tremendous capture, scan, photo, and biometric matching systems out there from excellent, reputable vendors. The improvements in the decade since selfie matching started are incredible and laudable. This process is significantly easier and more accurate than it ever was.

Still, people sitting at their desks get turned away from having “verified profiles”. Now try taking a picture of your ID with one hand, holding your phone in the other, on a sidewalk balancing your potential rental scooter against your hip in bright, vacation sunlight. Ya, no!

Let’s go Digital

Mobile Driver’s Licenses (mDL) are now available to greater than 17% of the US cardholding population. Adoption isn’t awesome yet because people don’t know they have existed in production for nearly 2 years and in concept for four-plus years. ISO/IEC 18013–5 is solid and functional. The data is digitally signed by your Issuer but you control it. It provides a level of control over data and transparency into data-storage that users haven’t experienced for decades. There are verifier apps and equipment from multiple vendors. Places do accept mDL.

In the coming months of 2024, ISO/IEC 18013–7 will add consented, selective-disclosure, digitally-signed mDL sharing to apps on the same device AND to websites over the Internet. (This will be the topic of a forthcoming post of mine.) So imagine sharing just your age to be granted entry to that website you don’t want your name associated with… or accurately filing for benefits without typing into a form. During the final approval stages, the ISO/IEC 18013–7 Technical Specification will be available for implementation.

Let’s Go Digital to Digital

“The first bank to replace their Mobile KYC with a connection to a Mobile Driver’s License is going to garner attention from the press and obtain new customers.”

It’s time.

The mDL provides digitally signed, matchable portraits upon consent as well as only the digitally signed data relevant for the approval. ISO/IEC 18013–7 will allow app-to-app consent and retrieval of mDL data similar to the way your email launches your airline application. It will also present a subset of mDL data to a website after consent for uses like age verification.

The challenge is made: Let’s short-circuit problems 1, 3, 4, 5, 6, 7, 8, and 10 above. Let’s make better self-registration and selfie-KYC solutions. Let’s work together on mDL acceptance. Who will sign up:

Relying Party (bank): _____________________________________

Biometric Match and Liveness Vendor: ______________________

mDL Solution Provider: ____________________________________

Test Subject mDL Holders: _________________________________

David can be reached for comments and feedback at david at decipher dot id

Wednesday, 21. February 2024

Just a Theory

Extension Ecosystem Jobs to be Done

The challenges of the current Postgres extension ecosystem and the interest and energy put into exploring new solutions make clear that the time has come to revisit the whole idea. We begin with a survey of the jobs to be done by extensions packaging and distribution.

Over on the Tembo blog I’ve published a thinking-through of what all the jobs to be done of the ideal Postgres extension ecosystem might be:

These challenges and the interest and energy put into exploring new solutions make clear that the time has come to revisit the whole idea of the PostgreSQL extension ecosystem: to work though the jobs to be done, specify the tools to do those jobs, and outline a plan for the broader Postgres community to design and build them.

Future posts will dream up the tools and make the plan; today we begin with the jobs.

🎬 Let’s get started.

More about… PGXN Extensions Metadata Jobs to be Done

Monday, 19. February 2024

Heres Tom with the Weather


Damien Bod

Using a CSP nonce in Blazor Web

This article shows how to use a CSP nonce in a Blazor Web application using the InteractiveServer server render mode. Using a CSP nonce is a great way to protect web applications against XSS attacks and other such Javascript vulnerabilities. Code: https://github.com/damienbod/BlazorServerOidc Notes The code in this example was built using the example provided by […]

This article shows how to use a CSP nonce in a Blazor Web application using the InteractiveServer server render mode. Using a CSP nonce is a great way to protect web applications against XSS attacks and other such Javascript vulnerabilities.

Code: https://github.com/damienbod/BlazorServerOidc

Notes

The code in this example was built using the example provided by Javier Calvarro Nelson.

https://github.com/javiercn/BlazorWebNonceService

Services and middleware

The Blazor Web application is implemented using the AddInteractiveServerComponents for the InteractiveServer server render mode. The nonce can be used by implementing a nonce service using the CircuitHandler. The nonce service is a scoped service.

builder.Services.AddRazorComponents() .AddInteractiveServerComponents(); builder.Services.TryAddEnumerable(ServiceDescriptor .Scoped<CircuitHandler, BlazorNonceService>(sp => sp.GetRequiredService<BlazorNonceService>())); builder.Services.AddScoped<BlazorNonceService>();

The headers are implemented using the NetEscapades.AspNetCore.SecurityHeaders package. The headers are added to the Blazor nonce service using the NonceMiddleware middleware.

app.UseSecurityHeaders(SecurityHeadersDefinitions.GetHeaderPolicyCollection( app.Environment.IsDevelopment(), app.Configuration["OpenIDConnectSettings:Authority"])); app.UseMiddleware<NonceMiddleware>(); Setup Security headers

The security headers CSP script tag is setup as best possible for a Blazor Web application. A CSP nonce is used as well as the fallback definitions for older browsers.

.AddContentSecurityPolicy(builder => { builder.AddObjectSrc().None(); builder.AddBlockAllMixedContent(); builder.AddImgSrc().Self().From("data:"); builder.AddFormAction().Self().From(idpHost); builder.AddFontSrc().Self(); builder.AddBaseUri().Self(); builder.AddFrameAncestors().None(); builder.AddStyleSrc() .UnsafeInline() .Self(); // due to Blazor builder.AddScriptSrc() .WithNonce() .UnsafeEval() // due to Blazor WASM .StrictDynamic() .OverHttps() .UnsafeInline(); // fallback for older browsers when the nonce is used }) Setup Middleware to add the nonce to the state

The NonceMiddleware uses the nonce header created by the security headers package and sets the Blazor nonce service with the value. This is updated on every request.

namespace BlazorWebFromBlazorServerOidc; public class NonceMiddleware { private readonly RequestDelegate _next; public NonceMiddleware(RequestDelegate next) { _next = next; } public async Task Invoke(HttpContext context, BlazorNonceService blazorNonceService) { var success = context.Items.TryGetValue( "NETESCAPADES_NONCE", out var nonce); if (success && nonce != null) { blazorNonceService.SetNonce(nonce.ToString()!); } await _next.Invoke(context); } } Using the nonce in the UI

The BlazorNonceService can be used from the Blazor components in the InteractiveServer render mode. The nonce is applied to all script tags. If the script does not have the correct nonce, it will not be loaded. The GetNonce method reads the nonce value from the BlazorNonceService service.

@inject IHostEnvironment Env @inject BlazorNonceService BlazorNonceService @using System.Security.Cryptography; <!DOCTYPE html> <html lang="en"> <head> <meta charset="utf-8" /> <meta name="viewport" content="width=device-width, initial-scale=1.0" /> <base href="/" /> <link rel="stylesheet" href="css/bootstrap/bootstrap.min.css" /> <link href="css/site.css" rel="stylesheet" /> <link href="BlazorWebFromBlazorServerOidc.styles.css" rel="stylesheet" /> <HeadOutlet @rendermode="InteractiveServer" /> </head> <body> <Routes @rendermode="InteractiveServer" /> http://_framework/blazor.web.js </body> </html> @code { /// <summary> /// Original src: https://github.com/javiercn/BlazorWebNonceService /// </summary> [CascadingParameter] HttpContext Context { get; set; } = default!; protected override void OnInitialized() { var nonce = GetNonce(); if (nonce != null) { BlazorNonceService.SetNonce(nonce); } } public string? GetNonce() { if (Context.Items.TryGetValue("nonce", out var item) && item is string nonce and not null) { return nonce; } return null; } } Notes

Nonces can be applied to Blazor Web using the server rendered mode and the BlazorNonceService which implements the CircuitHandler. Thanks the Javier Calvarro Nelson for providing a solution to this. Next would be to find a solution for the AddInteractiveWebAssemblyComponents setup. You should always use a CSP nonce on a server rendered application and only load scripts with the CSP nonce applied to it.

Links

https://github.com/javiercn/BlazorWebNonceService

https://github.com/andrewlock/NetEscapades.AspNetCore.SecurityHeaders

Friday, 16. February 2024

IdM Thoughtplace

Regarding the recent SAP IDM Announcement

 “Life begins like a dream, becomes a little real, and ends like a dream.” ― Michael Bassey Johnson, The Oneironaut’s Diary As many of you already know, SAP has made public its plans on how SAP IDM will be retired as a supported offering. I’ve been stewing on this for a bit as I try to figure out exactly how I feel about this and what needs to happen next. To be fair, I haven’t

 “Life begins like a dream, becomes a little real, and ends like a dream.” ― Michael Bassey Johnson, The Oneironaut’s Diary

As many of you already know, SAP has made public its plans on how SAP IDM will be retired as a supported offering. I’ve been stewing on this for a bit as I try to figure out exactly how I feel about this and what needs to happen next.

To be fair, I haven’t worked with the product much for just over four years, and even then, I was working more with Version 7 than with Version 8. My opinions are completely my own and do not represent my current employer, any previous employer, or SAP.

While IDM is certainly showing its age, there are some very good things about it that I would love to see as an open-source offering. First is the Batch Processing capabilities of IDM, based on the old MaXware Data Synchronization Engine/MetaCenter solutions. It features some powerful functionality to synchronize and cleanse data. It sets up fairly easily and is quite easy to configure. I’m sure the open-source community could do well with maintaining the UI (It definitely should be JAVA-based rather than the old Windows MMC) that will fit better in today’s Enterprise setting. Also, easy integration with SaaS services is a needed upgrade.

The other thing that should be released into the wild is the Virtual Directory. It also provides powerful functionality for several use cases, from pass-through authentication to assisting in M&A use cases. It’s the perfect example of a “Black Box” offering that just works. It also makes it much easier to synchronize and cleanse data by representing many different back ends via the easy-to-consume LDAP standard.

It saddens me that SAP is choosing to move away from IDM, as one of the key selling points of SAP IDM is its ability to integrate seamlessly with the SAP ecosystem. I hope SAP will help all LCM/IGA vendors connect more easily with systems. SaaS integration should be easy or standards-based, but we still need to be concerned for organizations still using on-premises SAP tools.

SAP has indicated that Microsoft’s Entra ID will be the main partner in the future, but I hope they make this information open to all vendors and that there will be continuing support of standard protocols. This article gives me some hope, but actions speak louder than words. I do have some concerns that SAP, known as a vast software ecosystem that supports itself and tends to ignore the enterprise, is handing off to another large software provider whose management tools tend to support their software ecosystem first and consider the enterprise second. Let’s face it: most of Microsoft’s Identity and Access Management efforts have been about supporting the Office 365 family of products. Don’t get me wrong; it’s better than SAP in this regard, but it’s not that high of a level to meet. For what it’s worth, I am guardedly optimistic, but I always try to remain hopeful.

Finally, I think it’s important to thank the IDM team in Sofia for all their hard work over the years, which, of course, would not have been possible without the vision and effort of the original MaXware team based in Trondheim, Norway, and associated teams in the UK, Australia, and the US. The production from these small teams helped define what Identity Management is to this day.

Will this be my last blog entry on the topic of SAP IDM? I don’t know. Part of it will depend on if there are any moves towards the Open Source world. There have been at least three times in my life when I thought I was done with this tool, and deep down, I’m pretty sure there is a little more in my future. 

In the meantime, I hope to resume blogging more regarding the Identity and Access Management field in the near future. Time will tell.




Kent Bull

CESR enters provisional status in IANA Media Type Registry

Registration of the composable event streaming representation (CESR) format in the IANA Media Type Registry shows a recent development of the key event receipt infrastructure (KERI) and authentic chained data containers (ACDC) space and how the space is growing. See the following link for the official entry: IANA Media Type […]

Registration of the composable event streaming representation (CESR) format in the IANA Media Type Registry shows a recent development of the key event receipt infrastructure (KERI) and authentic chained data containers (ACDC) space and how the space is growing.

See the following link for the official entry: IANA Media Type Registry entry for CESR (application/cesr)

Thursday, 15. February 2024

Jon Udell

Creating a GPT Assistant That Writes Pipeline Tests

Here’s the latest installment in the series on working with LLMS: Creating a GPT Assistant That Writes Pipeline Tests. Once you get the hang of writing these tests, it’s mostly boilerplate, so I figured my team of assistants could help. I recruited Cody, GitHub Copilot, and Unblocked — with varying degrees of success. Then I … Continue reading Creating a GPT Assistant That Writes Pipeline Tests

Here’s the latest installment in the series on working with LLMS: Creating a GPT Assistant That Writes Pipeline Tests.

Once you get the hang of writing these tests, it’s mostly boilerplate, so I figured my team of assistants could help. I recruited Cody, GitHub Copilot, and Unblocked — with varying degrees of success. Then I realized I hadn’t yet tried creating a GPT. As OpenAI describes them, “GPTs are custom versions of ChatGPT that users can tailor for specific tasks or topics by combining instructions, knowledge, and capabilities.”

The rest of the series:

1 When the rubber duck talks back

2 Radical just-in-time learning

3 Why LLM-assisted table transformation is a big deal

4 Using LLM-Assisted Coding to Write a Custom Template Function

5 Elevating the Conversation with LLM Assistants

6 How Large Language Models Assisted a Website Makeover

7 Should LLMs Write Marketing Copy?

8 Test-Driven Development with LLMs: Never Trust, Always Verify

9 Learning While Coding: How LLMs Teach You Implicitly

10 How LLMs Helped Me Build an ODBC Plugin for Steampipe

11 How to Use LLMs for Dynamic Documentation

12 Let’s talk: conversational software development

13 Using LLMs to Improve SQL Queries

14 Puzzling over the Postgres Query Planner with LLMs

15 7 Guiding Principles for Working with LLMs

16 Learn by Doing: How LLMs Should Reshape Education

17 How to Learn Unfamiliar Software Tools with ChatGPT

Tuesday, 13. February 2024

Moxy Tongue

Own Your Own AI

Working on it.. OYO AI by kidOYO® Learning Software, Education Services

Working on it.. OYO AI by kidOYO® Learning Software, Education Services

Monday, 12. February 2024

Moxy Tongue

Sovereign AI

In 2024, the utility of words yielding philosophical clarity that becomes embedded into the design of systems being deployed globally, and Nationally, yields methods that must be structured accurately in order to abide by the Sovereign systems they serve. In America, people own root authority, or the Sovereign infrastructure does not confer accuracy for, of, by human use. Data is the life blood o

In 2024, the utility of words yielding philosophical clarity that becomes embedded into the design of systems being deployed globally, and Nationally, yields methods that must be structured accurately in order to abide by the Sovereign systems they serve.

In America, people own root authority, or the Sovereign infrastructure does not confer accuracy for, of, by human use. Data is the life blood of AI systems. Data structure yields Sovereign results, and across our fast advancing world, inaccuracy deconstructs faster than accuracy builds accurately. The time has come for open transparent accuracy in the data structure of Soveriegnty itself to be laid bare, enabling the development of "Sovereign AI" upon foundations that serve people.

Many moons ago, this structural conversation began in the world of identity management. Professionally-deployed systems were challenged to confront inaccuracies in their modeling of human identity. Ubiquitously, people were no longer being conveyed structural constraints ensuring the structural accuracy of their root administrative authority over data systems of ultimate importance to their Sovereign participation and administration under well-founded laws that were crafted pre-tcp/ip, pre-digital data.

Identity systems have been challenged now for over 20 years to align their practices to the people they service. The work is not done. Self-Sovereign ID principles that emerged here on this blog, led to decentralized identity methods and practices advancing for developer use, and into general awareness by a population that is intensely interested in digital frontiers where their lives meet opportunity, security, and civil system integrity. The fire walls of Sovereign integrity, having been breached many times in consequential ways, started exposing their own structural deficiencies.

Enter AI: human identity that primarily exists in a database-driven system, and is founded on an old-era of physical presence, is now the domain of AI. Human beings can not compete structurally here, as AI derives utility that people provide, and far too often, provide ignorantly, without much personal insight or accountability for the structural choices conveyed upon them. Laws, as dependencies function, evolve at a much slower pace, and seem to lack insight into the structural underpinnings of identity silos that tcp/ip was advanced to break down and add utility too. Unfortunately, protections were not advanced with the same insight, and civil society is finding itself in a reactive mode, reacting to change like a surfer riding a wave, rather than a break wall securing civil participation in an AI-enabled society.

This is the moment. AI Sovereignty has a basic and tremendously important dependency in American civil society: people own root. 

If the data structure of human participation in America does not convey this basic structural reality, then people do not exist in a civil society, as defined by founding documents, intent, and Constitutional reach. Work is underway on this vector, and as always, the resulting choices and structures advanced will yield the results being pursued. The question on the table being asked is simple: do innovators understand what it means in structural Terms to ensure that people own root authority? 

"Own Your Own AI"


Damien Bod

Using Blob storage from ASP.NET Core with Entra ID authentication

This article shows how to implement a secure upload and a secure download in ASP.NET Core using Azure blob storage. The application uses Microsoft Entra ID for authentication and also for access to the Azure Blob storage container. Code: https://github.com/damienbod/AspNetCoreEntraIdBlobStorage Blogs in this series Security architecture The application is setup to store the file uploads […]

This article shows how to implement a secure upload and a secure download in ASP.NET Core using Azure blob storage. The application uses Microsoft Entra ID for authentication and also for access to the Azure Blob storage container.

Code: https://github.com/damienbod/AspNetCoreEntraIdBlobStorage

Blogs in this series Using Blob storage from ASP.NET Core with Entra ID authentication Delegated read and application write access to blob storage using ASP.NET Core with Entra ID authentication Multi client blob storage access using ASP.NET Core with Entra ID authentication and RBAC Security architecture

The application is setup to store the file uploads to an Azure Blob storage container. The authentication uses delegated only flows. A user can authenticate into the application using Microsoft Entra ID. The Azure App registration defines App roles to use for access authorization. The roles are used in the enterprise application. Security groups link the users to the roles. The security groups are used in the Azure Blob container where the RBAC is applied using the groups. A SQL database is used to persist the meta data and integrate into the other parts of the application.

Setting up Azure Blob storage

Two roles were created in the Azure App registration. The roles are assigned to groups in the Enterprise application. The users allowed to used to Azure Blob storage are assigned to the groups.

The groups are then used to apply the RBAC roles in the Azure Blob container. The Storage Blob Data Contributor and the Storage Blob Data Reader roles are used.

Authentication

Microsoft Entra ID is used for authentication and implemented using the Microsoft.Identity.Web Nuget packages. The is a standard implementation. Two policies were created to validate the two different roles used in this solution.

string[]? initialScopes = configuration.GetValue<string> ("AzureStorage:ScopeForAccessToken")?.Split(' '); services.AddMicrosoftIdentityWebAppAuthentication(configuration) .EnableTokenAcquisitionToCallDownstreamApi(initialScopes) .AddInMemoryTokenCaches(); services.AddAuthorization(options => { options.AddPolicy("blob-one-read-policy", policyBlobOneRead => { policyBlobOneRead.RequireClaim("roles", ["blobonereadrole", "blobonewriterole"]); }); options.AddPolicy("blob-one-write-policy", policyBlobOneRead => { policyBlobOneRead.RequireClaim("roles", ["blobonewriterole"]); }); }); services.AddRazorPages().AddMvcOptions(options => { var policy = new AuthorizationPolicyBuilder() .RequireAuthenticatedUser() .Build(); options.Filters.Add(new AuthorizeFilter(policy)); }).AddMicrosoftIdentityUI();

Upload

The application uses the IFormFile interface with the file payload and uploads the file to Azure Blob storage. The BlobClient is setup to use Microsoft Entra ID and the meta data is added to the blob.

public BlobDelegatedUploadProvider(DelegatedTokenAcquisitionTokenCredential tokenAcquisitionTokenCredential, IConfiguration configuration) { _tokenAcquisitionTokenCredential = tokenAcquisitionTokenCredential; _configuration = configuration; } [AuthorizeForScopes(Scopes = ["https://storage.azure.com/user_impersonation"])] public async Task<string> AddNewFile(BlobFileUploadModel blobFileUpload, IFormFile file) { try { return await PersistFileToAzureStorage(blobFileUpload, file); } catch (Exception e) { throw new ApplicationException($"Exception {e}"); } } private async Task<string> PersistFileToAzureStorage( BlobFileUploadModel blobFileUpload, IFormFile formFile, CancellationToken cancellationToken = default) { var storage = _configuration.GetValue<string>("AzureStorage:StorageAndContainerName"); var fileFullName = $"{storage}/{blobFileUpload.Name}"; var blobUri = new Uri(fileFullName); var blobUploadOptions = new BlobUploadOptions { Metadata = new Dictionary<string, string?> { { "uploadedBy", blobFileUpload.UploadedBy }, { "description", blobFileUpload.Description } } }; var blobClient = new BlobClient(blobUri, _tokenAcquisitionTokenCredential); var inputStream = formFile.OpenReadStream(); await blobClient.UploadAsync(inputStream, blobUploadOptions, cancellationToken); return $"{blobFileUpload.Name} successfully saved to Azure Blob Storage Container"; }

The DelegatedTokenAcquisitionTokenCredential class is used to get access tokens for the blob upload or download. This uses the existing user delegated session and creates a new access token for the blob storage access.

using Azure.Core; using Microsoft.Identity.Client; using Microsoft.Identity.Web; namespace DelegatedEntraIDBlobStorage.FilesProvider.AzureStorageAccess; public class DelegatedTokenAcquisitionTokenCredential : TokenCredential { private readonly ITokenAcquisition _tokenAcquisition; private readonly IConfiguration _configuration; public DelegatedTokenAcquisitionTokenCredential(ITokenAcquisition tokenAcquisition, IConfiguration configuration) { _tokenAcquisition = tokenAcquisition; _configuration = configuration; } public override AccessToken GetToken(TokenRequestContext requestContext, CancellationToken cancellationToken) { throw new NotImplementedException(); } public override async ValueTask<AccessToken> GetTokenAsync(TokenRequestContext requestContext, CancellationToken cancellationToken) { string[]? scopes = _configuration["AzureStorage:ScopeForAccessToken"]?.Split(' '); if (scopes == null) { throw new Exception("AzureStorage:ScopeForAccessToken configuration missing"); } AuthenticationResult result = await _tokenAcquisition .GetAuthenticationResultForUserAsync(scopes); return new AccessToken(result.AccessToken, result.ExpiresOn); } }

Download

The download creates a BlobClient using the user delegated existing session. The file is downloaded directly.

using Azure.Storage.Blobs; using Azure.Storage.Blobs.Models; using Microsoft.Identity.Web; namespace DelegatedEntraIDBlobStorage.FilesProvider.AzureStorageAccess; public class BlobDelegatedDownloadProvider { private readonly DelegatedTokenAcquisitionTokenCredential _tokenAcquisitionTokenCredential; private readonly IConfiguration _configuration; public BlobDelegatedDownloadProvider(DelegatedTokenAcquisitionTokenCredential tokenAcquisitionTokenCredential, IConfiguration configuration) { _tokenAcquisitionTokenCredential = tokenAcquisitionTokenCredential; _configuration = configuration; } [AuthorizeForScopes(Scopes = ["https://storage.azure.com/user_impersonation"])] public async Task<Azure.Response<BlobDownloadInfo>> DownloadFile(string fileName) { var storage = _configuration.GetValue<string>("AzureStorage:StorageAndContainerName"); var fileFullName = $"{storage}/{fileName}"; var blobUri = new Uri(fileFullName); var blobClient = new BlobClient(blobUri, _tokenAcquisitionTokenCredential); return await blobClient.DownloadAsync(); } } Notes

The architecture is simple and has the base features required for a secure solution. Data protection and virus scanning needs to be applied to the files and this can be configured in the Azure Blob storage. The access is controlled to the users in the group. If this needs to be controlled more, the write access can be removed from the users and switched to a service principal. This can have both security advantages and disadvantages. Multiple clients might also need access to files in this solution and the security needs to be enforced. This requires further architecture changes.

Links

https://learn.microsoft.com/en-us/azure/storage/blobs/authorize-access-azure-active-directory

https://docs.microsoft.com/en-us/azure/storage/blobs/storage-blobs-introduction

https://github.com/AzureAD/microsoft-identity-web

Sunday, 11. February 2024

Werdmüller on Medium

A creative process

No apps; no frameworks; just space. Continue reading on Medium »

No apps; no frameworks; just space.

Continue reading on Medium »

Saturday, 10. February 2024

Heres Tom with the Weather

Phishing Mitigation for Mastodon.social

When a person is already logged into a mastodon instance, if they visit some pages on their instance associated with a user from another server, they are not redirected to the remote server because it is easier to interact with the remote user with their existing local session. However, if a person without an account is just visiting or they have an account but are logged out, mastodon redirects

When a person is already logged into a mastodon instance, if they visit some pages on their instance associated with a user from another server, they are not redirected to the remote server because it is easier to interact with the remote user with their existing local session. However, if a person without an account is just visiting or they have an account but are logged out, mastodon redirects them to the remote server presumably because mastodon doesn’t know whether they have a local account and visiting the remote server will have the complete and authoritative data for that remote user.

A welcome update to mastodon.social (included in 4.3.0-nightly) is a warning presented to visitors or logged out users before mastodon redirects them to a remote server for the original page. The code for Add confirmation when redirecting logged-out requests to permalink is particularly relevant to mastodon.social compared to other fediverse instances as mastodon.social has become a relatively big target for phishing. It’s a good bet that if someone is navigating the fediverse that their account is on mastodon.social. So, if an arbitrary victim is logged out of their mastodon.social account and visits a mastodon.social page belonging to the attacker, prior to this mitigation, mastodon.social would automatically redirect the victim to the attacker’s page which might be a fake login form to trick the victim into submitting their login credentials to the attacker’s site. Unfortunately, a significant percentage of people will submit the form.

One could imagine mastodon.social maintaining a list of trusted servers for automatic redirects but that would be an undesirable hornet’s nest and it’s not a bad thing when web surfers are conscious of the trust boundaries on the web.

Friday, 09. February 2024

Phil Windleys Technometria

Zero Trust with Zero Data

The physical world is full of zero trust examples, but they gather attributes for the access control decisions in a very different way than we're used to online. Presenting your ID to buy beer is used so often as an example of how verifiable credentials work that it's cliche. Cliche or not, there's another aspect of using an ID to buy beer that I want to focus on: it's an excellent example of

The physical world is full of zero trust examples, but they gather attributes for the access control decisions in a very different way than we're used to online.

Presenting your ID to buy beer is used so often as an example of how verifiable credentials work that it's cliche. Cliche or not, there's another aspect of using an ID to buy beer that I want to focus on: it's an excellent example of zero trust

Zero Trust operates on a simple, yet powerful principle: “assume breach.” In a world where network boundaries are increasingly porous and cyber threats are more evasive than ever, the Zero Trust model centers around the notion that no one, whether internal or external, should be inherently trusted. This approach mandates continuous verification, strict access controls, and micro-segmentation, ensuring that every user and device proves their legitimacy before gaining access to sensitive resources. If we assume breach, then the only strategy that can protect the corporate network, infrastructure, applications, and people is to authorize every access.

From Zero Trust
Referenced 2024-02-09T08:25:55-0500

The real world is full of zero trust examples. When we're controlling access to something in the physical world—beer, a movie, a boarding gate, points in a loyalty program, prescriptions, and so on—we almost invariably use a zero trust model. We authorize every access. This isn't surprising, the physical world is remarkably decentralized and there aren't many natural boundaries to exploit and artificial boundaries are expensive and inconvenient.

The other thing that's interesting about zero trust in the physical world is that authorization is also usually done using Zero Data. Zero data is a name StJohn Deakin gave to the concept of using data gathered just in time to make authorization and other decisions rather than relying on great stores of data. There are obvious security benefits from storing less data, but zero data also offers significantly greater convenience for people and organizations alike. To top all that off, it can save money by reducing the number of partner integrations (i.e., far fewer federations) and enable applications that have far greater scale.

Let's examine these benefits in the scenario I opened with. Imagine that instead of using a credential (e.g., driver's license) to prove your age when buying beer, we ran convenience stores like a web site. Before you could shop, you'd have to register an account. And if you wanted to buy beer, the company would have to proof the identity of the person to ensure they're over 21. Now when you buy beer at the store, you'd log in so the system could use your stored attributes to ensure you were allowed to buy beer.

This scenario is still zero trust, but not zero data. And it's ludicrous to imagine anyone would put up with it, but we do it everyday online. I don't know about you, but I'm comforted to know that every convenience store I visit doesn't have a store of all kinds of information about me in an account somewhere. Zero data stores less data that can be exploited by hackers (or the companies we trust with it).

The benefit of scale is obvious as well. In a zero data, zero trust scenario we don't have to have long-term transactional relationships with every store, movie, restaurant, and barber shop we visit. They don't have to maintain federation relationships with numerous identity providers. There are places where the ability to scale zero trust really matters. For example, it's impossible for every hospital to have a relationship with every other hospital for purposes of authorizing access for medical personal who move or need temporary access. Similarly, airline personal move between numerous airports and need access to various facilities at airports.

How do we build zero data, zero trust systems? By using verifiable credentials to transfer attributes about their subject in a way that is decentralized and yet trustworthy. Zero data aligns our online existence more closely with our real-world interactions, fostering new methods of communication while decreasing the challenges and risks associated with amassing, storing, and utilizing vast amounts of data.

Just-in-time, zero data, attribute transfer can make many zero trust scenarios more realizable because it's more flexible. Zero trust with zero data, facilitated by verifiable credentials, represents a pivotal transition in how digital identity is used in authorization decisions. By minimizing centralized data storage and emphasizing cryptographic verifiability, this approach aims to address the prevalent challenges in data management, security, and user trust. By allowing online interactions to more faithfully follow established patterns of transferring trust from the physical world, zero trust with zero data promotes better security with increased convenience and lower cost. What's not to like?

You can get more detail on many of the concepts in this post like verifiable credentials in my new book Learning Digital Identity from O'Reilly Media.

Photo Credit: We ID Everyone from DALL-E (Public Domain) DALL-E apparently thinks a six-pack has 8 bottles but this was the best of several attempts.

Monday, 05. February 2024

Damien Bod

Secure an ASP.NET Core Blazor Web app using Microsoft Entra ID

This article shows how to implement an ASP.NET Core Blazor Web application using Microsoft Entra ID for authentication. Microsoft.Identity.Web is used to implement the Microsoft Entra ID OpenID Connect client. Code: https://github.com/damienbod/Hostedblazor8MeID Note: I based this implementation on the example provided by Tomás López Rodríguez and adapted it. Setup The Blazor Web application is an

This article shows how to implement an ASP.NET Core Blazor Web application using Microsoft Entra ID for authentication. Microsoft.Identity.Web is used to implement the Microsoft Entra ID OpenID Connect client.

Code: https://github.com/damienbod/Hostedblazor8MeID

Note: I based this implementation on the example provided by Tomás López Rodríguez and adapted it.

Setup

The Blazor Web application is an OpenID Connect confidential client (code flow, PKCE) which uses Microsoft Entra ID for authentication. An Azure App registration (Web configuration) is used to create the client and only delegated scopes are used. A secret is used to authenticate the application in development. Client assertions can be used in production deployments. NetEscapades.AspNetCore.SecurityHeaders is used to implement the security headers as best possible for Blazor Web. No identity management or user passwords are handled in the application.

The client part of the Blazor Web application can use the PersistentAuthenticationStateProvider class to read the user profile data.

This uses data from the server part implemented in the PersistingRevalidatingAuthenticationStateProvider class. See the code in the github repo.

OpenID Connect confidential client

The AddMicrosoftIdentityWebAppAuthentication method is used to implement the client authentication using the Microsoft.Identity.Web packages. I use a downstream API to force that the client uses code flow with PKCE instead of the implicit flow. Microsoft Graph is only requesting delegated user profile data.

// Add authentication services var scopes = builder.Configuration.GetValue<string>("DownstreamApi:Scopes"); string[] initialScopes = scopes!.Split(' '); builder.Services.AddMicrosoftIdentityWebAppAuthentication(builder.Configuration) .EnableTokenAcquisitionToCallDownstreamApi(initialScopes) .AddMicrosoftGraph("https://graph.microsoft.com/v1.0", scopes) .AddInMemoryTokenCaches();

The client automatically reads from the AzureAd configuration. This can be changed if you would like to update the product name. The client uses the standard Microsoft Entra ID setup. You need to add the permissions in the Azure App registration created for this application.

"AzureAd": { "Instance": "https://login.microsoftonline.com/", "Domain": "[Enter the domain of your tenant, e.g. contoso.onmicrosoft.com]", "TenantId": "[Enter 'common', or 'organizations' or the Tenant Id (Obtained from the Azure portal. Select 'Endpoints' from the 'App registrations' blade and use the GUID in any of the URLs), e.g. da41245a5-11b3-996c-00a8-4d99re19f292]", "ClientId": "[Enter the Client Id (Application ID obtained from the Azure portal), e.g. ba74781c2-53c2-442a-97c2-3d60re42f403]", "ClientSecret": "[Copy the client secret added to the app from the Azure portal]", "ClientCertificates": [ ], // the following is required to handle Continuous Access Evaluation challenges "ClientCapabilities": [ "cp1" ], "CallbackPath": "/signin-oidc" }, "DownstreamApi": { "Scopes": "User.ReadBasic.All user.read" },

Login and Logout

An AuthenticationExtensions class was used to implement the login and the logout for the application. The Login method is an HTTP GET request which redirects to the OpenID Connect server. The Logout method is an authentication HTTP POST request which requires CSRF protection and accepts no parameters. The return URL to the unauthenticated signed out page is fixed and so no open redirect attacks are possible. The login cleans up the local cookies as well as a redirect to the identity provider to logout on Microsoft Entra ID.

using Microsoft.AspNetCore.Authentication.Cookies; using Microsoft.AspNetCore.Authentication.OpenIdConnect; using Microsoft.AspNetCore.Authentication; namespace BlazorWebMeID; public static class AuthenticationExtensions { public static WebApplication SetupEndpoints(this WebApplication app) { app.MapGet("/Account/Login", async (HttpContext httpContext, string returnUrl = "/") => { await httpContext.ChallengeAsync(OpenIdConnectDefaults.AuthenticationScheme, new AuthenticationProperties { RedirectUri = !string.IsNullOrEmpty(returnUrl) ? returnUrl : "/" }); }); app.MapPost("/Account/Logout", async (HttpContext httpContext) => { var authenticationProperties = new AuthenticationProperties { RedirectUri = "/SignedOut" }; await httpContext.SignOutAsync(OpenIdConnectDefaults.AuthenticationScheme, authenticationProperties); await httpContext.SignOutAsync(CookieAuthenticationDefaults.AuthenticationScheme); }).RequireAuthorization(); return app; } }

Security headers

The security headers are used to protect the session. When using AddInteractiveWebAssemblyComponents mode, the script CSP header is really weak and adds little protection leaving the application open to numerous XSS, Javascript attacks. It is not possible to use CSP nonces with Blazor Web using the InteractiveWebAssemblyComponents mode, or I have not found a way to do this, as the Blazor Web components cannot read the HTTP headers in the response. A Blazor WASM hosted in an ASP.NET Core application can use CSP nonces and is a more secure application.

namespace HostedBlazorMeID.Server; public static class SecurityHeadersDefinitions { public static HeaderPolicyCollection GetHeaderPolicyCollection(bool isDev, string? idpHost) { ArgumentNullException.ThrowIfNull(idpHost); var policy = new HeaderPolicyCollection() .AddFrameOptionsDeny() .AddContentTypeOptionsNoSniff() .AddReferrerPolicyStrictOriginWhenCrossOrigin() .AddCrossOriginOpenerPolicy(builder => builder.SameOrigin()) .AddCrossOriginResourcePolicy(builder => builder.SameOrigin()) .AddCrossOriginEmbedderPolicy(builder => builder.RequireCorp()) .AddContentSecurityPolicy(builder => { builder.AddObjectSrc().None(); builder.AddBlockAllMixedContent(); builder.AddImgSrc().Self().From("data:"); builder.AddFormAction().Self().From(idpHost); builder.AddFontSrc().Self(); builder.AddStyleSrc().Self(); builder.AddBaseUri().Self(); builder.AddFrameAncestors().None(); // due to Blazor Web, nonces cannot be used with AddInteractiveWebAssemblyComponents mode. // weak script CSP.... builder.AddScriptSrc() .Self() // self required .UnsafeEval() // due to Blazor WASM .UnsafeInline(); // only a fallback for older browsers when the nonce is used }) .RemoveServerHeader() .AddPermissionsPolicy(builder => { builder.AddAccelerometer().None(); builder.AddAutoplay().None(); builder.AddCamera().None(); builder.AddEncryptedMedia().None(); builder.AddFullscreen().All(); builder.AddGeolocation().None(); builder.AddGyroscope().None(); builder.AddMagnetometer().None(); builder.AddMicrophone().None(); builder.AddMidi().None(); builder.AddPayment().None(); builder.AddPictureInPicture().None(); builder.AddSyncXHR().None(); builder.AddUsb().None(); }); if (!isDev) { // maxage = one year in seconds policy.AddStrictTransportSecurityMaxAgeIncludeSubDomains(); } policy.ApplyDocumentHeadersToAllResponses(); return policy; } }

Notes

I am starting to understand how Blazor Web works and have difficultly with the session state and sharing this between different components. Some basic browser security cannot be used, i.e. CSP nonces. The mixed mode has strange UI effects which I could not clean up.

There are now four types of Blazor applications.

Blazor WASM hosted in an ASP.NET Core application Blazor Server Blazor Web Blazor WASM standalone

Blazor WASM hosted in an ASP.NET Core application and Blazor Server can be secured in a good way using the recommended security best practices (OpenID Connect confidential client). Blazor Web can implement a confidential client but is missing the recommend script session protection. Blazor WASM standalone cannot implement the recommended authentication as it is a public application and should no longer be used in secure environments.

Links

https://github.com/CrahunGit/Auth0BlazorWebAppSample/tree/master/BlazorApp4

https://github.com/dotnet/blazor-samples/tree/main/8.0/BlazorWebAppOidc

https://github.com/AzureAD/microsoft-identity-web

https://github.com/andrewlock/NetEscapades.AspNetCore.SecurityHeaders

Friday, 02. February 2024

Just a Theory

Presentation: Introduction to the PGXN Architecture

I made a presentation on the PGXN architecture for the Tembo team.

As I started digging into the jobs and tools for the Postgres extension ecosystem as part of my new gig, I realized that most people have little knowledge of the PGXN architecture. I learned a lot designing PGXN and its services, and am quite pleased with where it ended up, warts and all. So I thought it worthwhile to put together a brief presentation on the fundamental design principals (static REST file API), inter-related services (root mirror, manager, API, site) and tools (CLI, CI/CD).

Yesterday, the Tembo blog published the presentation, including the video and slides, along with a high-level architecture diagram. I hope it’s a useful point of reference for the Postgres community as we look to better distribute extensions in the future.

More about… PGXN Software Architecture REST JSON Tembo

Thursday, 01. February 2024

Just a Theory

Contemplating Decentralized Extension Publishing

The Go package ecosystem uses distributed publishing to release modules without authentication or uploads. Could we do something similar for Postgres extensions?
TL;DR

As I think through the future of the Postgres extension ecosystem as a key part of the new job, I wanted to understand how Go decentralized publishing works. In this post I work it out, and think through how we might do something similar for Postgres extension publishing. It covers the Go architecture, namespacing challenges, and PGXS abuse; then experiments with URL-based namespacing and ponders reorganizing installed extension files; and closes with a high-level design for making it work now and in the future.

It is, admittedly, a lot, mainly written for my own edification and for the information of my fellow extension-releasing travelers.

I find it fascinating and learned a ton. Maybe you will too! But feel free to skip this post if you’re less interested in the details of the journey and want to wait for more decisive posts once I’ve reached the destination.

Introduction

Most language registries require developers to take some step to make releases. Many automate the process in CI/CD pipelines, but it requires some amount of effort on the developer’s part:

Register for an account Learn how to format things to publish a release Remember to publish again for every new version Create a pipeline to automate publishing (e.g., a GitHub workflow) Decentralized Publishing

Go decentralized publishing has revised this pattern: it does not require user registration or authentication to to publish a module to pkg.go.dev. Rather, Go developers simply tag the source repository, and the first time someone refers to the tag in Go tools, the Go module index will include it.

For example, publishing v1.2.1 of a module in the github.com/golang/example repository takes just three commands:

git tag v1.2.1 -sm 'Tag v1.2.1' git push --tags go list -m github.com/golang/example@v1.2.1

After a few minutes, the module will show up in the index and then on pkg.go.dev. Anyone can run go get -u github.com/golang/example to get the latest version. Go developers rest easy in the knowledge that they’re getting the exact module they need thanks to the global checksum database, which Go uses “in many situations to detect misbehavior by proxies or origin servers”.

This design requires go get to understand multiple source code management systems: it supports Git, Subversion, Mercurial, Bazaar, and Fossil.1 It also needs the go.mod metadata file to live in the project defining the package.

But that’s really it. From the developer’s perspective it could not be easier to publish a module, because it’s a natural extension of the module development tooling and workflow of committing, tagging, and fetching code.

Decentralized Extension Publishing

Could we publish Postgres extensions in such a decentralized pattern? It might look something like this:

The developer places a metadata file in the proper location (control file, META.json, Cargo.toml, whatever — standard TBD) To publish a release, the developer tags the repository and calls some sort of indexing service hook (perhaps from a tag-triggered release workflow) The indexing service validates the extension and adds it to the index

Note that there is no registration required. It simply trusts the source code repository. It also avoids name collision: github.com/bob/hash is distinct from github.com/carol/hash.

This design does raise challenges for clients, whether they’re compiling extensions on a production system or building binary packages for distribution: they have to support various version control systems to pull the code (though starting with Git is a decent 90% solution).

Namespacing

Then there’s name conflicts. Perhaps github.com/bob/hash and github.com/carol/hash both create an extension named hash. By the current control file format, the script directory and module path can use any name, but in all likelihood the use these defaults:

directory = 'extension' module_pathname = '$libdir/hash'

Meaning .sql files will be installed in the Postgres share/extension subdirectory — along with all the other installed extensions — and library files will be installed in the library directory along with all other libraries. Something like this:

pgsql ├── lib │ └── hash.so └── share └── extension │ └── hash.control │   ├── hash--1.0.0.sql └── doc └── hash.md

If both projects include, say, hash.control, hash--1.0.0.sql, and hash.so, the files from one will stomp all over the files of the other.

Installer Abuse

Go avoids this issue by using the domain and path from each package’s repository in its directory structure. For example, here’s a list of modules from google.golang.org repositories:

$ ls -1 ~/go/pkg/mod/google.golang.org api@v0.134.0 api@v0.152.0 appengine@v1.6.7 genproto genproto@v0.0.0-20230731193218-e0aa005b6bdf grpc@v1.57.0 grpc@v1.59.0 protobuf@v1.30.0 protobuf@v1.31.0 protobuf@v1.32.0

The ~/go/pkg/mod directory has subdirectories for each VCS host name, and each then subdirectories for package paths. For the github.com/bob/hash example, the files would all live in ~/go/pkg/mod/github.com/bob/hash.

Could a Postgres extension build tool follow a similar distributed pattern by renaming the control file and installation files and directories to something specific for each, say github.com+bob+hash and github.com+carol+hash? That is, using the repository host name and path, but replacing the slashes in the path with some other character that wouldn’t create subdirectories — because PostgreSQL won’t find control files in subdirectories. The control file entries for github.com/carol/hash would look like this:

directory = 'github.com+carol+hash' module_pathname = '$libdir/github.com+carol+hash'

Since PostgreSQL expects the control file to have the same name as the extension, and for SQL scripts to start with that name, the files would have to be named like so:

hash ├── Makefile ├── github.com+carol+hash.control └── sql └── github.com+carol+hash--1.0.0.sql

And the Makefile contents:

EXTENSION = github.com+carol+hash MODULEDIR = $(EXTENSION) DATA = sql/$(EXTENSION)--1.0.0.sql PG_CONFIG ?= pg_config PGXS := $(shell $(PG_CONFIG) --pgxs) include $(PGXS)

In other words, the extension name is the full repository host name and path and the Makefile MODULEDIR variable tells pg_config to put all the SQL and documentation files into a directories named github.com+carol+hash — preventing them from conflicting with any other extension.

Finally, the github.com+carol+hash.control file — so named becaus it must have the same name as the extension — contains:

default_version = '1.0.0' relocatable = true directory = 'github.com+carol+hash' module_pathname = '$libdir/github.com+carol+hash'

Note the directory parameter, which must match MODULEDIR from the Makefile, so that CREATE EXTENSION can find the SQL files. Meanwhile, module_pathname ensures that the library file has a unique name — the same as the long extension name — again to avoid conflicts with other projects.

That unsightly naming extends to SQL: using the URL format could get to be a mouthful:

CREATE EXTENSION "github.com+carol+hash";

Which is do-able, but some new SQL syntax might be useful, perhaps something like:

CREATE EXTENSION hash FROM "github.com+carol+hash";

Or, if we’re gonna really go for it, use slashes after all!

CREATE EXTENSION hash FROM "github.com/carol/hash";

Want to use both extensions but they have conflicting objects (e.g., both create a “hash” data type)? Put them into separatre schemas (assuming relocatable = true in the control file):

CREATE EXTENSION hash FROM "github.com/carol/hash" WITH SCHEMA carol; CREATE EXTENSION hash FROM "github.com/bob/hash" WITH SCHEMA bob; CREATE TABLE try ( h1 carol.hash, h2 bob.hash );

Of course it would be nice if PostgreSQL added support for something like Oracle packages, but using schemas in the meantime may be sufficient.

Clearly we’re getting into changes to the PostgreSQL core, so put that aside and we can just use long names for creating, modifying, and dropping extensions, but not necessarily otherwise:

CREATE EXTENSION "github.com+carol+hash" WITH SCHEMA carol; CREATE EXTENSION "github.com+bob+hash" WITH SCHEMA bob; CREATE EXTENSION "gitlab.com+barack+kicker_type"; CREATE TABLE try ( h1 carol.hash, h2 bob.hash kt kicker ); Namespacing Experiment

To confirm that this approach might work, I committed 24134fd and pushed it in the namespace-experiment branch of the semver extension. This commit changes the extension name from semver to github.com+theory+pg-semver, and follows the above steps to ensure that its files are installed with that name.

Abusing the Postgres extension installation infrastructure like this does work, but suffers from a number of drawbacks, including:

The extension name is super long, as before, but now so too are the files in the repository (as opposed to the installer renaming them on install). The shared library file has to have the long name, so therefore does the .c source file. The SQL files must all start with github.com+theory+pg-semver, although I skipped that bit in this commit; instead the Makefile generates just one from sql/semver.sql. Any previous installation of the semver type would remain unchanged, with no upgrade path. Changing an extension’s name isn’t a great idea.

I could probably script renaming and modifying file contents like this and make it part of the build process, but it starts to get complicated. We could also modify installers to make the changes, but there are a bunch of moving parts they would have to compensate for, and given how dynamic this can be (e.g., the semver Makefile reads the extension name from META.json), we would rapidly enter the territory of edge case whac-a-mole. I suspect it’s simply too error-prone.

Proposal: Update Postgres Extension Packaging

Perhaps the Go directory pattern could inspire a similar model in Postgres, eliminating the namespace issue by teaching the Postgres extension infrastructure to include all but one of the files for an extension in a single directory. In other words, rather than files distributed like so for semver:

pgsql ├── lib │ └── semver.so └── share └── extension │ └── semver.control │   ├── semver--0.32.1.sql │   ├── semver--0.32.0--0.32.1.sql └── doc └── semver.md

Make it more like this:

pgsql └── share └── extension └── github.com └── theory └── pg-semver └── extension.control └── lib │ └── semver.so └── sql │ └── semver--0.32.1.sql │ └── semver--0.32.0--0.32.1.sql └── doc └── semver.md

Or perhaps:

pgsql └── share └── extension └── github.com └── theory └── pg-semver └── extension.control └── semver.so └── semver--0.32.1.sql └── semver--0.32.0--0.32.1.sql └── semver.md

The idea is to copy the files exactly as they’re stored in or compiled in the repository. Meanwhile, the new semver.name file — the only relevant file stored outside the extension module directory — simply points to that path:

github.com/theory/pg-semver

Then for CREATE EXTENSION semver, Postgres reads semver.name and knows where to find all the files to load the extension.

This configuration would require updates to the control file, now named extension.control, to record the full package name and appropriate locations. Add:

name = 'semver' package = 'github.com/theory/pg-semver'

This pattern could also allow aliasing. Say we try to install a different semver extension from github.com/example/semver. This is in its extension.control file:

name = 'semver' package = 'github.com/example/pg-semver'

The installer detects that semver.name already exists for a different package and raises an error. The user could then give it a different name by running something like:

make install ALIAS_EXTENSION_NAME=semver2

This would add semver2.name right next to semver.name, and its contents would contain github.com/example/semver, where all of its files are installed. This would allow CREATE EXTENSION semver2 to load the it without issue (assuming no object conflicts, hopefully resolved by relocate-ability).

I realize a lot of extensions with libraries could wreak some havoc on the library resolver having to search so many library directories, but perhaps there’s some way around that as well? Curious what techniques experienced C developers might have adopted.

Back to Decentralized Publishing

An updated installed extension file structure would be nice, and is surely worth a discussion, but even if it shipped in Postgres 20, we need an updated extension ecosystem today, to work well with all supported versions of Postgres. So let’s return to the idea of decentralized publishing without such changes.

I can think of two pieces that’d be required to get Go-style decentralized extension publishing to work with the current infrastructure.

Module Uniqueness

The first is to specify a new metadata field to be unique for the entire index, and which would contain the repository path. Call it module, after Go (a single Git repository can have multiple modules). In PGXN Meta Spec-style JSON it’d look something like this:

{ "module": "github.com/theory/pg-semver", "version": "0.32.1", "provides": { "semver": { "abstract": "A semantic version data type", } } }

Switch from the PGXN-style uniqueness on the distribution name (usually the name of the extension) and let the module be globally unique. This would allow another party to release an extension with the same name. Even a fork where only the module is changed:

{ "module": "github.com/example/pg-semver", "version": "0.32.1", "provides": { "semver": { "abstract": "A semantic version data type", } } }

Both would be indexed and appear under the module name, and both would be find-able by the provided extension name, semver.

Where that name must still be unique is in a given install. In other words, while github.com/theory/pg-semver and github.com/example/pg-semver both exist in the index, the semver extension can be installed from only one of them in a given Postgres system, where the extension name semver defines its uniqueness.

This pattern would allow for much more duplication of ideas while preserving the existing per-cluster namespacing. It also allows for a future Postgres release that supports something like the flexible per-cluster packaging as described above.2

Extension Toolchain App

The second piece is an extension management application that understands all this stuff and makes it possible. It would empower both extension development workflows — including testing, metadata management, and releasing — and extension user workflows — finding, downloading, building, and installing.

Stealing from Go, imagine a developer making a release with something like this:

git tag v1.2.1 -sm 'Tag v1.2.1' git push --tags pgmod list -m github.com/theory/pg-semver@v1.2.1

The creatively named pgmod tells the registry to index the new version directly from its Git repository. Thereafter anyone can find it and install it with:

pgmod get github.com/theory/pg-semver@v1.2.1 — installs the specified version pgmod get github.com/theory/pg-semver — installs the latest version pgmod get semver — installs the latest version or shows a list of matching modules to select from

Any of these would fail if the cluster already has an extension named semver with a different module name. But with something like the updated extension installation locations in a future version of Postgres, that limitation could be loosened.

Challenges

Every new idea comes with challenges, and this little thought experiment is no exception. Some that immediately occur to me:

Not every extension can be installed directly from its repository. Perhaps the metadata could include a download link for a tarball with the results of any pre-release execution? Adoption of a new CLI could be tricky. It would be useful to include the functionality in existing tools people already use, like pgrx. Updating the uniqueness constraint in existing systems like PGXN might be a challenge. Most record the repository info in the resources META.json object, so it would be do-able to adapt into a new META format, either on PGXN itself or in a new registry, should we choose to build one. Getting everyone to standardize on standardized versioning tags might take some effort. Go had the benefit of controlling its entire toolchain, while Postgres extension versioning and release management has been all over the place. However PGXN long ago standardized on semantic versioning and those who have released extensions on PGXN have had few issues (one can still use other version formats in the control file, for better or worse). Some PGXN distributions have shipped different versions of extensions in a single release, or the same version as in other releases. The release version of the overall package (repository, really) would have to become canonical.

I’m sure there are more, I just thought of these offhand. What have you thought of? Post ’em if you got ’em in the #extensions channel on the Postgres Slack, or give me a holler on Mastodon or via email.

Or does it? Yes, it does. Although the Go CLI downloads most public modules from a module proxy server like proxy.golang.org, it still must know how to download modules from a version control system when a proxy is not available. ↩︎

Assuming, of course, that if and when the Postgres core adopts more bundled packaging that they’d use the same naming convention as we have in the broader ecosystem. Not a perfectly safe assumption, but given the Go precedent and wide adoption of host/path-based projects, it seems sound. ↩︎

More about… Postgres PGXN Extensions Go Packaging Distributed Publishing

Wednesday, 31. January 2024

Just a Theory

PGXN Tools v1.4

The pgxn-tools Docker image has seen some recent bug fixes and improvements.

Over on the PGXN Blog I’ve posted a brief update on recent bug fixes and improvements to the pgxn-tools Docker image, which is used fairly widely these days to test, bundle, and release Postgres extensions to PGXN. This fix is especially important for Git repositories:

v1.4.1 fixes an issue where git archive was never actually used to build a release zip archive. This changed at some point without noticing due to the introduction of the safe.directory configuration in recent versions of Git. Inside the container the directory was never trusted, and the pgxn-bundle command caught the error, decided it wasn’t working with a Git repository, and used the zip command, instead.

I also posted a gist listing PGXN distributions with a .git directory.

More about… Postgres PGXN Docker GitHub Workflow

Mike Jones: self-issued

Invited OpenID Federation Presentation at 2024 FIM4R Workshop

The OpenID Federation editors were invited to give a presentation on OpenID Federation at the 18th FIM4R Workshop, which was held at the 2024 TIIME Unconference. Giuseppe De Marco, Roland Hedberg, John Bradley, and I tag-teamed the presentation, with Vladimir Dzhuvinov also participating in the Q&A. Topics covered included motivations, architecture, design decisions, capabilities, use […]

The OpenID Federation editors were invited to give a presentation on OpenID Federation at the 18th FIM4R Workshop, which was held at the 2024 TIIME Unconference. Giuseppe De Marco, Roland Hedberg, John Bradley, and I tag-teamed the presentation, with Vladimir Dzhuvinov also participating in the Q&A. Topics covered included motivations, architecture, design decisions, capabilities, use cases, history, status, implementations, and people.

Here’s the material we used:

OpenID Federation 1.0: Shaping The Advanced Infrastructure of Trust

It was the perfect audience – chock full of people with practical federation deployment experience!


Fully-Specified Algorithms adopted by JOSE working group

The “Fully-Specified Algorithms for JOSE and COSE” specification has been adopted by the JOSE working group. See my original post about the spec for why fully-specified algorithms matter. Thanks to all who supported adoption and also thanks to those who provided useful detailed feedback that we can address in future working group drafts. The specification […]

The “Fully-Specified Algorithms for JOSE and COSE” specification has been adopted by the JOSE working group. See my original post about the spec for why fully-specified algorithms matter. Thanks to all who supported adoption and also thanks to those who provided useful detailed feedback that we can address in future working group drafts.

The specification is available at:

https://www.ietf.org/archive/id/draft-ietf-jose-fully-specified-algorithms-00.html

Wrench in the Gears

Magical Realism Among The Raindrops

I woke up with piles of boxes filling one end of my living room. They’d been there since last weekend when I hauled them up from the basement and down from the third floor. Given our recent investigations into fascia, the psyche, and computation, perhaps the physicality of my cardboard wrangling was a subconscious tactic [...]

I woke up with piles of boxes filling one end of my living room. They’d been there since last weekend when I hauled them up from the basement and down from the third floor. Given our recent investigations into fascia, the psyche, and computation, perhaps the physicality of my cardboard wrangling was a subconscious tactic to help me process the loss. Contained within an assemblage of Playmobil, YA fantasy novels, vintage ephemera, and yellowed diplomas were the remaining belongings of two-thirds of my tiny nuclear family. At 8:30am movers were scheduled to arrive to whisk everything to storage, so I made sure to be up and out of the house early. Everything was organized and labelled with notes for my husband on what was to go, and which items were to be relocated in the house. It’s what moms do, right?

The night before I tucked away “Piggy,” my child’s beloved stuff animal companion, threadbare again after one flesh-colored terry cloth “chest” transplant. I gave him one last big hug and placed him gently next to “Brown Wolf,” both of them atop a quilt of wild 70s prints made by my grandmother in the summer camp trunk covered with Dolly’s Dairy Bar stickers, a Brevard, NC staple. It is the end of an era. I wept softly and alone in the chill of the sewing room.

After I had a few years of parenting under my belt, I would proffer parents of infants this insight – no matter how terrible or how wonderful life is at any given moment, the one constant is change. Cherish the sweet moments and know that the miserable ones will pass eventually. It may seem like forever, but it isn’t. I still think it’s solid advice, though bittersweet all the same. I’ve come to accept the situation and scrabble for scraps of grace to get through another day until the house sells and I can start to start over. We are still human, right? Feeling all of the big feelings is part of the job description, and yes, I’m here for it.

While packing up some lovely textiles woven by my husband’s late mother, I came across a few concert t-shirts from back in the day when he wore army jackets and his hair longish and was a DJ for late night hardcore shows that aired on college radio stations. Me? I was pretty square, naive, knew very little of the world. We met during a study abroad semester in Venice. On foggy fall nights we would walk holding hands to the tip of Dorsoduro, past the Salute plague church, to “the point,” Punta Della Dogana – young love. There we would sit watching the lights edging St. Mark’s Square where the Canale Grande met La Giudecca, the deep channel.

Back then I had no conception of water memory, Neptune,  psi, or waves of consciousness. Unseen from street level, Atlas and Fortuna kept watch over the point, the latter holding a rudder aloft showing the direction of the wind, and our fate. I can look back and see my course was set when I was just a callow child off on a big adventure that would eventually sweep me to the City of Brotherly Love nestled between the Delaware and the Schuylkill, and then on, apparently, to the very different burbling 4,000-year-old waters of the Ozarks.

A few years back, my friend Dru shared with me the Chinese parable of the man who lost his horse. The gist was that when something happens in our lives that is apparently good or apparently bad, it is best to keep a neutral outlook, because the good might lead to the bad, or the bad to the good. Life cycles up and down. That our lives together started out under the globe of the heavens held by two atlas figures topped by a seventeenth century statue of Fortune charting human weather, commerce, navigation, bureaucratic systems…it’s quite a perfect summation of the tapestry I’ve been weaving without quite being aware of it until now.

Good? Bad? I guess I shall stay the course and see what comes next and try to hold onto my sense of wonder and playfulness as the trickster energy bubbles up around me.

I texted my soon-to-be-ex-husband a picture of “Misfits: Legacy of Brutality,” saying it seems the tables have turned. Now he’s an institutional administrator while I’ve gradually slid over into the outlier spot. Earlier in the process of letting go I spent a few days sifting through drawers of old cards and letters and report cards and crayon drawings, the fleeting realms of moms. Back then I was still a beloved wife and mother and daughter. The cards said so, at least. Now, I am an out-of-tune, dissonant dissident in a world rushing obliviously forward. Only a handful of people seem to be able to see and comprehend the layers of logic protocols and invisible sensors bathing us in invisible harmonic frequencies and smart cybernetic governance to tap into our soulful connection to the divine.

This is the world my child will navigate. Capably, I have no doubt.

And me? I will watch the phase shift coalesce from the sidelines. If I’m lucky I will view the etheric confluence barefoot peeking over a bed of glorious rainbow zinnias with a tomato sandwich in hand, juice running down my forearm, and when the sandwich is done, I will put a kayak on my aging Subaru and point it towards the crystal clear waters of Lake Ouachita.

I have a memory of being in a circle of families. Our children were part of a semester school in the Pisgah Forest south of Asheville. We were about to entrust our precious ones to four months of expeditionary learning, which in retrospect wasn’t my best parenting choice but you do the best with the information you have at the time. One of the teachers read Khalil Gibran’s poem “On Children.”

 

And a woman who held a babe against her bosom said, Speak to us of Children.

And he said:

Your children are not your children.

They are the sons and daughters of Life’s longing for itself.

They come through you but not from you,

And though they are with you yet they belong not to you.

You may give them your love but not your thoughts,

For they have their own thoughts.

You may house their bodies but not their souls,

For their souls dwell in the house of tomorrow, which you cannot visit, not even in your dreams.

You may strive to be like them but seek not to make them like you.

For life goes not backward nor tarries with yesterday.

You are the bows from which your children as living arrows are sent forth.

The archer sees the mark upon the path of the infinite, and He bends you with

His might that His arrows may go swift and far.

Let your bending in the archer’s hand be for gladness.

For even as He loves the arrow that flies, so He loves also the bow that is stable.

 

I remember being nonplussed and unsettled. I wasn’t ready, but we never are. I understand more about information fields and the divine creation and energy than I did then. My child did come through me, and settled so far, so far away, launched from breast milk, home cooked dinners from our CSA farm box, hand sewn Halloween costumes, art classes, bedtime stories, family trips, walks to the library, and kisses on the brow, even after adopting the habit of pushing me away in annoyance.

With everything in order, I headed out to get a leisurely breakfast at a restaurant that was about a twenty-minute walk from home just off the Ben Franklin Parkway, Mr. Electric himself. There was a steady rain washing away last week’s snow leaving craggy miniature mountains of ice chips languishing on the soggy drifts of leaves that didn’t manage to get bagged up during the fall – crystalline phase shift. It was a mild, drizzly rain, the kind that has a hint of the spring that is to come. There wasn’t much wind, and the misty droplets coated all the edges, plants and buildings with sparkly beads of water – billions of miniscule lenses abstracting and expanding the limits of the ordinary material world. It was magical.

 

Ironically the umbrella I’d grabbed wasn’t my regular rainbow one, or even the boring black one, but another that was emblazoned with the UPenn Arts and Sciences logo on it. The past week or so I’ve spent some time learning about Mark Johnson, a philosopher from the University of Oregon who spent his career talking about the ways we use our body to make meaning in the world. You see, our intelligence isn’t just in our brains and the nervous system that keeps it jumping, it’s also in our body and the liquid crystal fascia is an amazing part of that system. Art, culture, creativity, sports, dance are all gifts we toss back and forth with the quantum field of our collective consciousness.

These goofballs really want to meld all of that with natural philosophy, STEM we call it now, to create a unified computational system. This goal of melding social and natural sciences extends back at least to Leibniz who sought to create a universal computer where the grace of the human soul might be leveraged to shine a light on the secrets of the cosmos. I don’t think it’s likely to work, at least not how they think, but I chuckled all the same that the umbrella I’d unthinkingly grabbed on the way out the door had a special message for me. I was walking away from the life represented by that UPenn logo and soon heading south into the unknown, and the water was there as my witness.

In his “Stalking the Wild Pendulum” Itzhak Bentov wrote of our lives as embodied information organizers, pattern seekers, and meaning makers. Bentov felt that as we make sense of the material world through our experiences, our epiphanies, heartbreaks, and joy, there are invisible tapestries woven on the warp and weft of psyche and biology. These complex works of art, infinitely varied, are gradually gifted to the universal mind upon the passing of our bodies. That vision really resonates with me, particularly in combination with Johnson’s theories about non-linguistic communication and the importance of art and culture and the movement of the body in carrying out that important work on both and individual and social level.

Stephers shared with me a fascinating paper about social systems and how we use artefacts to leave traces of ourselves on our environments and those traces then influence the behaviors of others. Our use of cultural artefacts to imprint our consciousness on the energetic fields of the universe that vibrates around us is similar to the concept of pheromones that coordinate the collective efforts of eusocial creatures like ants or termites. My sense is that Web3 hopes to make cultural artefacts calculable at scale, converting them into universal coordinating signals to coax digital harmonies from the remaining wild corners that are slowly being overtaken by silicon. I mapped out some notes on this in my Dallas Mythos map below. 

Source: https://embed.kumu.io/dc9b79f81e2bb35bc4fce22d59dde62b#untitled-map?s=bm9kZS1iU0kxa29DZA%3D%3D

What follows are images from my walk to breakfast and back, with a slight extension of the route to the former GlaxoSmithKline US headquarters that has since been turned into a String Theory charter school. This embodied meaning making exhorts us to go out into the world, the real world and see what’s out there to be seen! For me, since everything turned upside down, I’ve felt inclined to stay close to home and disconnect. Maybe it’s not a healthy way to be, but for right now cocooning feels right. Still, when I go out with intention and my eyes open, the universe tells me stories and leaves me clues. No, not like I’m “hearing” things or “getting downloads,” more like I’m just aligned with the symbolic language of the universe and open to exploring in a wondering, wandering way.

First, I encountered  a large puddle at an intersection. I paused for a few minutes watch the ripples of the raindrops and all of the rings that danced across the surface of the water. Bentov’s book spoke of information being stored in interference patterns. He described a three-pebble experiment where the ripples where flash frozen into a holographic information storage system in the water. In my case there weren’t pebbles, but dancing drops that captivated me.

And then just a half block down I had to stop again and take a picture of this surprising gift – a fish ornamented with spirals swimming through a colorful mosaic sea, as if to remind me that we don’t get to choose our river. We can, however, pay attention and find the best way to work with the flow.

And after I finished my mushroom toast, all the better to get in touch with the mycelial elements, I came out of the restaurant and saw that in the chainlink fence surrounding the empty lot across the street someone had created an impromptu party cup installation, mashing the plastic into the gaps to spell out the word “HOPE” in crude, but very legible letters. I smiled.

At this point I decided to go a bit further afield rather than head straight back to the house. I passed the Temple of the Church of Jesus Christ of Latter Day Saints, which stands about a block and a half from the Catholic Cathedral of Saints Peter and Paul.

And between the two on a sidewalk above the I-676 crosstown highway was a pretty lame art installation of Ben Franklin with his lightening bolts and keys, all electrical potential and trusted cryptography and Metaverse portals. It’s all right there if you have the eyes to see it. In the background is the former GlaxoSmithKline building.

I knew I wanted to take a photo of the Zenos Frudakis 2000 bronze sculpture “Freedom.” Zenos is the name of an Old Testament prophet featured in the Book of Mormon, which is interesting since you can see the temple from the piece. The composition was inspired by Rodin’s “Gates of Hell,” in turn inspired by Dante, which can be found a few blocks further along the Parkway.

I’d seen the piece a few times before in passing, but I’d never stopped to look at it very closely. I was kind of shocked.

The image conveys a person escaping the confines of the background block. For me, however, knowing the multiverse phase shift that is underway this “freedom,” especially one sponsored by a company working with Google Alphabet’s Verily on electroceutical development, feels a bit off.

Could it be the animal skull at the base?

Or the smirking cat staring out?

Or an incongruous jester, which is totally out of keeping wit the rest of the piece?

The texture of the figures spoke to me of fascia – movement, communication through embodiment, fractal tissues, today glistening with water.

Bodies at different scales embedded in their environment, a miniature twin tucked in one corner.

I felt drawn to reach out and put my hand on the outstretched hand. It reminded me of my dad, whose hands were so big and the way his fingers would curl down over mine and give them a firm squeeze. I miss that.

As I headed west to go home there was a