The people who called me crazy because “there’s no way your phone can be listening in on you all the time” are the same people who are going to be the most excited about this “feature”
Im a perfect world, as they claim, its a secondary system listening that isn’t recording or transmitting anything, and is meant to be low power. If it hears the wake up word, it wakes up the other mic and starts recording.
Thats how they claim the smart speakers work anyway.
This was my understanding, but I just don’t believe it anymore. There have been way, way too many time my wife and I were talking about an incredibly niche thing that didn’t come up through the internet in any way, and lo and behold the algorithm presented those key words. Nobody will ever convince me it isn’t being done to some extent.
I had this happen many years ago, to the point there was no chance something wasn’t listening. We suspected it was my partners iPhone with Facebook installed before they got better about preventing abuse like that, as it was a Facebook ad that showed up.
Were talking about something where we never use the product, would never use the product, but it came up in a conversation just between the 2 of us, and there were ads the next day.
It doesn’t need to, that’s the issue, there is so much other data you are generating that can be harvested. Nothing you talk about is completely random, so it’s incredibly easy to build profiles about you, without listening to a single word.
I understand that’s the theory, but these situations were specifically not something that could be easily gleaned. We’re talking like reminiscing about things that happened in our pre internet youth that there’s no record of anywhere and that came up randomly in conversation. I’m definitely aware of the dynamic, even before ubiquity of the internet, it’s true that sometimes companies would know people were pregnant before the person did based on their purchasing profile. This wasn’t that though, there’s just no possible connection.
That happened a few times now, so pretty much nothing is going to convince me it’s not the case.
You look for network traffic. You might not be able to see inside the packets, but you can know when they’re sending packets, and how many. As far as I know, voice assistant systems that claim to use a secondary local circuit to detect calls are telling the truth.
That’s kind of what I was wondering, I figured this as well as a way to track that it is at least sending data at unusual times. Someone else in this thread explained that actually determining what that data is would be difficult yeah: https://lemmy.world/post/48510943/24408747
I don’t know enough about system security or forensics to evaluate this, but it does make sense based on what I know.
The consensus so far seems to be that they don’t collect as much data as people think, partly because they can’t process all of it, and partly because educated guesses are good enough to target ads often enough.
I have a memory of people black boxing it and seeing power usage and network traffic that supported the claims but that was a snapshot in time and as others note its all proprietary.
It takes a lifetime to build a good reputation, but you can lose it in a minute.
They ship with proprietary code, this would be the point of open source.
In practice in my experience, every company is at least skirting the law regarding privacy, and I never worked for one big enough that could lobby itself out of a fine.
I used to run forensic network capture and analysis tools.
First thing, traffic is encrypted. All you will see is a blob of traffic passing through. You used to see hostnames with TLS, but now with quic, you see nothing. This makes it hard.
You could root the phone and install a root ca certificate for a decrypting proxy, you might see more, but the data itself (not just the transport protocol) could be encoded or even encrypted within the network encapsulation.
Next, you’d have to reverse engineer the protocol if they’re using something nonstandard. Also, malware can often be set up to “behave” when it can detect analysis. I’m all but certain Google would do this.
Maybe you could do statistical analysis of the traffic and attempt to baseline normal vs when it’s transmitting audio. It would be a bit of a blind guess at best.
If I had more time, I’d love to try it. I have an old pixel7 pro. Maybe I can sort something out.
People have already done that and shown that no the device isn’t listening to you 24/7 and sending all your data out. There are plenty of papers on the subject, and it makes sense. Why record, decode and analyze all audio when your digital footprint is so much easier to compile and analyze. People aren’t random, so it’s easy to put them into statistical buckets of how to target them. Here is one reference paper (of many): https://recon.meddle.mobi/papers/panoptispy18pets.pdf
Even if it was open source, you’d need to be able to verify what they ship matches the specs. Allowing you to flash whatever you want onto it helps, but you still need to validate the hardware.
Doesn’t need to track you all the time to know exactly who you are and what you’re up to.
Continuously monitoring is such a waste of their resources, they already know everything about you, they just need to check in now and then to make sure you’re buying the correct t-shirts.
The people who called me crazy because “there’s no way your phone can be listening in on you all the time” are the same people who are going to be the most excited about this “feature”
How did these people expect “Hey Siri” / “Hey Google” to work?
Im a perfect world, as they claim, its a secondary system listening that isn’t recording or transmitting anything, and is meant to be low power. If it hears the wake up word, it wakes up the other mic and starts recording.
Thats how they claim the smart speakers work anyway.
This would be different.
This was my understanding, but I just don’t believe it anymore. There have been way, way too many time my wife and I were talking about an incredibly niche thing that didn’t come up through the internet in any way, and lo and behold the algorithm presented those key words. Nobody will ever convince me it isn’t being done to some extent.
I had this happen many years ago, to the point there was no chance something wasn’t listening. We suspected it was my partners iPhone with Facebook installed before they got better about preventing abuse like that, as it was a Facebook ad that showed up.
Were talking about something where we never use the product, would never use the product, but it came up in a conversation just between the 2 of us, and there were ads the next day.
It happened a few times.
It doesn’t need to, that’s the issue, there is so much other data you are generating that can be harvested. Nothing you talk about is completely random, so it’s incredibly easy to build profiles about you, without listening to a single word.
I understand that’s the theory, but these situations were specifically not something that could be easily gleaned. We’re talking like reminiscing about things that happened in our pre internet youth that there’s no record of anywhere and that came up randomly in conversation. I’m definitely aware of the dynamic, even before ubiquity of the internet, it’s true that sometimes companies would know people were pregnant before the person did based on their purchasing profile. This wasn’t that though, there’s just no possible connection.
That happened a few times now, so pretty much nothing is going to convince me it’s not the case.
It can’t hear if it isn’t already listening.
“How they claim?” Is there no way to confirm that?
You look for network traffic. You might not be able to see inside the packets, but you can know when they’re sending packets, and how many. As far as I know, voice assistant systems that claim to use a secondary local circuit to detect calls are telling the truth.
That’s kind of what I was wondering, I figured this as well as a way to track that it is at least sending data at unusual times. Someone else in this thread explained that actually determining what that data is would be difficult yeah: https://lemmy.world/post/48510943/24408747
I don’t know enough about system security or forensics to evaluate this, but it does make sense based on what I know.
The consensus so far seems to be that they don’t collect as much data as people think, partly because they can’t process all of it, and partly because educated guesses are good enough to target ads often enough.
I have a memory of people black boxing it and seeing power usage and network traffic that supported the claims but that was a snapshot in time and as others note its all proprietary.
It takes a lifetime to build a good reputation, but you can lose it in a minute.
They ship with proprietary code, this would be the point of open source.
In practice in my experience, every company is at least skirting the law regarding privacy, and I never worked for one big enough that could lobby itself out of a fine.
would this not be detectable by tracking the data sent through your network?
I used to run forensic network capture and analysis tools.
First thing, traffic is encrypted. All you will see is a blob of traffic passing through. You used to see hostnames with TLS, but now with quic, you see nothing. This makes it hard.
You could root the phone and install a root ca certificate for a decrypting proxy, you might see more, but the data itself (not just the transport protocol) could be encoded or even encrypted within the network encapsulation.
Next, you’d have to reverse engineer the protocol if they’re using something nonstandard. Also, malware can often be set up to “behave” when it can detect analysis. I’m all but certain Google would do this.
Maybe you could do statistical analysis of the traffic and attempt to baseline normal vs when it’s transmitting audio. It would be a bit of a blind guess at best.
If I had more time, I’d love to try it. I have an old pixel7 pro. Maybe I can sort something out.
People have already done that and shown that no the device isn’t listening to you 24/7 and sending all your data out. There are plenty of papers on the subject, and it makes sense. Why record, decode and analyze all audio when your digital footprint is so much easier to compile and analyze. People aren’t random, so it’s easy to put them into statistical buckets of how to target them. Here is one reference paper (of many): https://recon.meddle.mobi/papers/panoptispy18pets.pdf
If its real time monitoring you, but not if its logging data to send later when it would be expected to be doing so.
Audio doesnt take up much space.
Even if it was open source, you’d need to be able to verify what they ship matches the specs. Allowing you to flash whatever you want onto it helps, but you still need to validate the hardware.
I dont know. You’d need to reverse engineer the hardware and software to be confident, and could a OTA update then sneak a bypass in anyway?
Edit: i think Amazon might have abandoned this as well and always records on echos now too.
Doesn’t need to track you all the time to know exactly who you are and what you’re up to.
Continuously monitoring is such a waste of their resources, they already know everything about you, they just need to check in now and then to make sure you’re buying the correct t-shirts.