Beat me to asking this follow up, though you linking additional resources is probably more effort that I would have done. Thanks for that!
Comment on How does this pic show that Elon Musk doesnt know SQL?
spankmonkey@lemmy.world 1 week agoThe initial statement I believe is down to a combination of the above and also the lack of domain knowledge around social security. The primary key on the social security table would be a composite key of both the SSN and a date of birth—duplicates are expected of just parts of the key.
Since SSNs are never reused, what would be the purpose of using the SSN and birth date together as part of the primary key? I guess it is the one thing that isn’t supposed to ever change (barring a clerical error) so I could see that as a good second piece of information, just not sure what it would be adding.
Note: if duplicate SSNs are accidentally issued my understanding is that they issue a new one to one of the people.
Q20: Are Social Security numbers reused after a person dies?
A: No. We do not reassign a Social Security number (SSN) after the number holder’s death. Even though we have issued over 453 million SSNs so far, and we assign about 5 and one-half million new numbers a year, the current numbering system will provide us with enough new numbers for several generations into the future with no changes in the numbering system.
DahGangalang@infosec.pub 1 week ago
Lightor@lemmy.world 6 days ago
My guess would be around your note. If someone mistakenly has two SSNs (due to fraud, error, or name changes), combining DOB helps detect inconsistencies.
Some other possibilities, and I’m just throwing out ideas at this point:
- Adding DOB could help with manual lookups and verification.
- Using SSN + DOB ensures a standard key format across agencies, making it easier to link records.
- Prevents accidental duplication if an SSN is mistyped.
- Maybe the databases were optimized for fixed-length fields, and combining SSN + DOB fit within memory constraints.
- It was easier to locate records with a “human-readable” key. Where as something like a UUID is harder for humans to read or sift through.
halcyonloon@midwest.social 1 week ago
Take this with a grain of salt as I’m not a dev, but do work on CMS reporting for a health information tech company. Depending on how the database is designed an SSN could appear in multiple tables.
In my experience reduplication happens as part of generating a report so that all relevant data related to a key and scope of the report can be gathered from the various tables.
DahGangalang@infosec.pub 1 week ago
A given SSN appearing in multiple tables actually makes sense. To someone not familiar with SQL (i.e. at about my level of understanding), I could see that being misinterpreted as having multiple SSN repeated “in the database”.
Barbarian@sh.itjust.works 1 week ago
Theoretically, yeah, that’s one solution. The more reasonable thing to do would be to use the foreign key though. So, for example:
SSN Table ID | SSN | Other info
Other Table ID | SSN ID | Other info
When you want to connect them to have both sets of info, it’d be the following:
SELECT * FROM SSN_Table JOIN Other_Table ON SSN_Table.ID = Other_Table.SSN_ID
schteph@lemmy.world 1 week ago
This is true, but there are many instances where denormalization makes sense and is frequently used.
A common example is a table that is frequently read. Instead of going to the “central” table the data is denormalized for faster access. This is completely standard practice for every large system.
There’s nothing inherently wrong with it, but it can be easily misused. With SSN, I’d think the most stupid thing to do is to use it as the primary key. The second one would be to ignore the security risks that are ingrained in an SSN. The federal government, being large as it is, I’m sure has instances of both, however since Musky is using his possy of young, arrogant brogrammers, I’m positively certain they’re completely ignoring the security aspect.
DahGangalang@infosec.pub 1 week ago
Yeah, databases are complicated and make my head hurt. Glancing through resources from other comments, I’m realizing I know next to nothing about database optimization. Like, my gut reaction to your comment is that it seems like unnecessary overhead to have that data across two tables - but if one sub-dept didn’t need access to the raw SSN, but did need access to less personal data, j could see those stored in separate tables.
But anyway, you’re helping clear things up for me. I really appreciate the pseudo code level example.
Ephera@lemmy.ml 1 week ago
The SSN is likely to appear in multiple tables, because they will reference a central table that ties it all together. This central table will likely only contain the SSN, the birth date (from what others have been saying), as well as potentially first and last name. In this table, the entries have to be unique.
But then you might have another table, like a table listing all the physical exams, which has the SSN to be able to link it to the person’s name, but ultimately just adds more information to this one person. It does not duplicate the SSN in a way that would be bad.
spankmonkey@lemmy.world 1 week ago
It is common for long lived databases with a rotating cast of devs to use different formats in different tables as well! One might have it as a string, one might have it as a number, and the other might have it with hyphens in the same database.
Hell, I work in a state agency and one of our older databases has a dozen tables with databases.
The main reason for the discrepancy is not looking at what was used before or not understanding that they can always change the formatting when displayed so they don’t need to include the parenthesis or hyphens in the database itself.
pixxelkick@lemmy.world 1 week ago
Okay but if that happens, musk is right that that’s a bit of a denormalization issue that mayne needs resolving.
SSNs should be stored as strings without any hyphen or additional markup, nothing else.
It’s more likely though it’s just a composite key…
spankmonkey@lemmy.world 1 week ago
This is not what he is actively doing though. He isn’t trying to improve databases.
He is tearing down entire departments and agencies and using shit like this to justify it.