(I also still can't believe Bluesky hasn't put an aggressive rate limit on following and especially on starter pack use for new accounts, but that's an old complaint, heh)
If there's precisely four items in the list and the emoji is oddly connected and you have to squint to see how it relates, it's almost always inauthentic, but the style in general is cause for closer scrutiny in and of itself just due to the frequency of use by inauthentic accounts.
I've observed that whatever software is generating many of the bios for these inauthentic accounts has ingested a bunch of the "one characteristic plus emoji per line" bios and is mix and matching them for its own. I have gotten way more suspicious of that bio style lately.
Sorry, but did you read a single word I wrote?
I'm not talking about the past. I'm talking about now. Today. As in the racism (and sexism and assorted other bigotry) within the ranks here, now, today, is a problem.
I don’t know the future, and it’s possible that there’s some e-thing that would change my life, but honestly, at this point the best thing a techbro could make for my quality of life would be a really good sensor-based plant watering system, or a robot vacuum that doesn’t choke on pet hair.
This is where LAION-5B got in trouble: they made an effort to clean out the dataset but they did it with their own tools, presumably because they didn't know PhotoDNA existed or didn't want to go to the trouble of trying to get access, and their own tools were woefully inadequate at IDing stuff
--as well as any novel-image detection that involves doing it themselves instead of using a system that has had multiparty validation/input (as much as I hate them, Thorn's Safer is the state of the art there).
At minimum, I would look for some statement in their methodology section that describes the efforts they've made to remove that material, and anything that doesn't involve running the image hashes against the known-images database from NCMEC/PhotoDNA should be suspect--
Basically, any dataset scraped from the internet *will* contain some amount of CSAM unless the assembler was both *extremely* diligent in sourcing and has cleaned it several times against both hash databases of known images and well-tuned ML-based novel image detection systems.