Survivalist Forum banner
1 - 5 of 5 Posts

Information is Ammunition
22,122 Posts
Discussion Starter · #1 ·
Massive Cloud Auditing

Cloud Auditing of massive logs requires analyzing data volumes which routinely cross the peta-scale threshold.
Computational and storage requirements of any data analysis methodology will be significantly increased.
Distributed data mining algorithms and implementation techniques needed to meet scalability and performance requirements entailed in such massive data analyses.
Current distributed data mining approaches pose serious issues in performance and effectiveness in information extraction of cloud auditing logs.
Reasons include scalability, dynamic and hybrid workload, high sensitivity, and stringent time constraints.

Traffic Characterization

Cloud Traffic logs accumulated from diverse and geographically disparate sources.
Sources include stored and live traffic from popular web applications: Web and Email
Live Packet Capture from packet sniffing tools (Wireshark)
Honeypot traffic from UTSA comprising of malicious traffic.
Augment traffic information with IP,DNS, geolocation analysis procured from publicly available datasets and high level network and flow statistics.
IP geolocation from public databases retrieve the name and street address of the organization which registered the address block. For large ISPs the registered street address usually differs from the real location of its hosts.
Measurement based IP geolocation utilize active packet delay measurements to approximate the geographical location of network hosts.
Secure IP geolocation to defend against adversaries manipulating packet delay measurements to forge locations
Traffic characterization will generate massive amount of data. Need for distributed data storage.

Online Data Mining

Develop data mining algorithms that work in a massively parallel and yet online fashion for mining of large data streams
Reducing time between query submission and obtaining results.
Overall speed of query processing depends critically on the query response time
Map-Reduce programming model used for fault-tolerant and massively parallel data crunching .
But Map-Reduce implementations work only in batch mode and do not allow stream processing or exploiting of preliminary results.

Here's my safety Sir
14,678 Posts
  • Like
Reactions: Tundrascout

Molon Labe
280 Posts
lol. data mining is nothing new. why allot of this has been outsourced. big companies already do this and all the gov does is tell companys to give them what info they need in a format they need it in and they pay them for it.


It's quite widely know that data mining isn't new, nor is it solely used by corporations as an information source.

In-Q-Tel, CIA front and data mining company. The Gov't doesn't need to pay for information, it can collect all on it's own. FB is a great source for them.

The FBI admitted FreeBSD was a bloated botnet it used to collect information, NSA admitted in helping with .NET framework on Win32 machines, Patriot Act, ect...

The fact that the AIR FORCE is openly admitting and even parading this is what's so mind boggling about it. I've come to expect this from alphabet crew agencies but not the military.
1 - 5 of 5 Posts